A while ago I wanted insight in how our Alfresco Share repository was used. ‘Ad-hoc reporting’ and ‘(always changing) management reporting’ is not Alfresco’s strongest feature. There is a need for reporting in a way a business user understands, using their tools of choice. The Alfresco Business Reporting project delivers the missing link. It extracts Alfresco business objects (like documents, folders, datalists, links, discussions) into plain SQL tables. The total set of properties for each object, including your custom aspects, show up as columns in your table. And you, the business, can use your reporting tool of choice, to generate fancy reports. No need for IT consultants to configure Alfresco for every change or additional report. Once the synchronization is configured, you can define as much reports as you like, answering your (or your management’s) questions immediately.
The project is composed of two parts; filling the reporting database, and generating reports from the reporting database.
Filling the reporting database
The most important feature is extracting the Alfresco repository objects into ‘plain’ SQL tables. Alfresco’s database structure is extremely efficient and flexible for its content management purposes. It is however quite hard to report against this database structure, and find all properties belonging to a particular type (and its applied aspects). You cannot let the business report against the Alfresco native database.
Alfresco Business Reporting uses JDBC to push the Alfresco metadata into a seperate reporting database. This reporting database can be located anywhere, for example to off-load Alfresco’s production database server. Lucene queries are as the basis to fill sql tables. For each table a Lucene query defines the set of objects to push into this particular table. The table definitions as shipped with this release will provide you some Alfresco Types, like Documents, Folders, Links, Posts, DataListItems etc. If your business case has a need to select objects having a certain aspect or objects havind some property value(s), feel free. Lucene caters for a wide variety of query options to select just those objects relevant in your business domain.
For each object found the related table in the reporting database will be stretched to fit all properties found in the result set of the Lucene query. This will include all properties in the assigned aspects (even if only 1 object in your result list has this aspect). The limitations on your result set (result size or query time) are eliminated on the go. You can schedule the process of pushing the metadata from the Alfresco repository into the reporting database. And you can initially bulk-load this reporting database, but you can update incrementally as well. This means, only pushing business objects into the reporting database if they have been modified after the last successful run.
This tool is meant to be configured. My initial purpose was to see how our Alfresco Share instance was used. I configured the tables to match the Share business objects. You can configure your tool to match the business objects relevant in your domain. Do you use Alfresco as a case management system? Find metrics of how the system is used, throughput, statistics per case type, outcome per case type, outcome per case-responsible… Statistics per year, last month. Find out if content is only added, or people actually update content.
Generating reports from the reporting database
The biggest win is in having all Alfresco metadata available in a format the business can process. The business should be in control of creating reports instead of calling an Alfresco engineer if they need a new report type. (From my perspective I don’t have the time for that, I like to do projects.) Personally I use JasperSoft‘s iReport and Pentaho‘s Report Designer if I am in need for some reporting. I created some logic to use Alfresco as a reporting server. Both suppliers mentioned each have their own Reporting Server, which makes sense if your business has serious needs for reporting. I just wanted my few reports to be executed on a regular basis.
The configuration of this package allows to define an Alfresco space where the report definitions can be found, and where the resulting reports should be created/modified. Given the script provided, all reports in the ‘input’ space will be executed. If you add another report definition (.prpt, .jrxml or .jasper), it will be executed next run. The output is stored in the space you can configure. I put it into an Alfresco Share space where I tweaked permissions according the content of the pdf-reports.
This report-generating part of the project is not mandatory. If you have your reporting solution in place, use it like that! Once you defined the business objects relevant for your domain, any reporting tool will do.
How to use
I know undocumented software is widely available on the web.Therefor I provided some information about choices, what you get (zips, jars etc, but also the table definitions), how to install, stuff that still needs to be done in the project wiki.
Work to do
I just published the first release of Alfresco Business Reporting. It is not perfect, and if I wait for that moment, it will never get out. I know I need to expand on the Alfresco data types (associations, mutli-value properties). I can see providing an AMP package makes installation more easy (but i need to move the database credentials to the alfresco-global,properties first).
If you have a pointer, see a gab that needs to be crossed, point me to flaws, or (need) help in any other way, please let me know. Enjoy your new insights in how your repository is used!
[Update 20130424: Changed project logo to comply to Alfresco's trademark guidelines and policies ]