Silently, last month a new version of the Alfresco Business Reporting module (version 220.127.116.11) has been released. The focus of this release was improved performance, mainly in repositories serving a lot of content. The focus of this release was mainly the harvesting process, getting the Alfresco metadata into the Reporting database. In order to do so, some restructuring had to take place.
The reporting tables are now indexed. A database index has been applied to serve the most obvious queries executed to validate if metadata of a given object is updated (or not). This of course works against the supported databases. The effect is that the harvesting process progresses at a continues steady pace (instead of slowly degrading performance).
lastSuccesfulRun -> lastSuccesfulBatch
In the past, the harvesting of a given table had to be fully done, and only after that the lastsuccessfulrun timestamp would be updated. This can become an issue if a certain table takes longer to harvest than time permits (e.g. if a system goes down at night for a backup). Especially before adding the database index (described above), this could happen, depending on the moment of harvesting.
The reporting module has been enhanced in two ways to enable more steady harvesting. First, after each batch of say 1000 objects (the default, maximum number of search results by Alfresco), the lastsuccesfulrun timestamp is updated. This prevents that if a run does not complete successfull in the end, the next harvesting run starts at the old historical timestamp. Second, the number of batches (of these 1000 results) can be limited per harvesting run. This way the system can be more controlled (by you) in terms of effort it can spend on harvesting.
The lastsuccesfulrun table behaviour has slightly changed. For the tables that get harvested incrementally, there is a split in first harvesting the workspace spacestore, and then the archive spacestore. The urgency was caused by the previous enhancements, changing the concept of lastSuccesfulRun into lastSuccesfulBatch. The system first needs to harvest the workspace, and only after that the archive spacestore. This means that it can take a few runs to harvest all of the workspace, before it is alowed to harvest the archive spacestore. This made me split the timestamps in the workspace timestamp, and an archive-spacestore timestamp. The actual database table contains rows for each, in the admin console reporting page these are combined into 1 row.
The result if this new version is that bigger repositories get harvested much faster, and much more stable. On the roadmap for ‘soon’: getting the harvesting and reporting execution actions to the Share interface, become Alfresco 5.0 proof, and support Workflow-task/Workflow-process harvesting!
Let me know your experiences with Alfresco Business Reporting, so we can enhance the module!