In a recent post I just discovered this “Alfresco generate reports with JasperReports” project. It seemed to fit my needs for reporting against Alfresco and Share, and I was very happy and exited. However… It appeared to be too complex to use.
I try to be as little indispensible as possible. If someone from ‘the business’ is in need of reporting, I try to give access to a tool that suits his/her needs. If they need to create or modify a report, they should not be dependent of me, or anyone else capable of doing Alfresco Magic. Being build op Open Standards and proven technologies does not mean everybody is capable of doing everything with it
My view on reporting is:
- ‘The business’ should be able to create/maintain reports.
- The reporting environment should support Aspects. If one adds a type with some additional properties, the reporting environment should be able to deal with it, without manual reconfiguring.
I basically investigated 3 approaches:
Go XML. This is the approach of the “Alfresco generate reports with JasperReports” project. It is nice, but has same disadvantages. First of all the way to configure the data feed. XPath and XQuery is not something I trust the business with. You need more knowledge to construct a query than iReport will bring. If the business should be able to construct/maintain reports, it is a no-go. Next to that, the xml feed needs to be cached somewhere, real time generation is a pain when there is a lot of content.
Go NOSQL. Considering the Aspect-nature of Alfresco, a flexible approach would be one that accepts all kind of metadata fields without reconfiguring the reporting environment. NOSQL should be able to cater for that. MongoDB would be a nice candidate, kind of mature, JSON to interact with the database and multi platform availability. Downside is that the query language is not ‘really common’. Considering my customers, I would be the query wizzard again. This seems technically the most fancy solution, but it does not solve my problem; the business will not be able to operate the reporting by themselves (although iReport would ‘support’ it).
Go SQL. The big plus would be that some SQL knowledge is out there. There would be a lot more resources other than me to create and maintain reports. there also is enough tooling of choice to define and generate these reports. Downside is that a SQL database has a rigid set of columsn and column definitions. Manual extending a set of table definitions if someone happened to extend some aspects in the repository would be ‘unacceptable’. If this could be auto-configured, I have a nice solution…
But how do I get the Aspect flexibility in the reporting tables? Can I use Alfresco’s own repository? In my Filenet days I created a view over the database to allow (real time) reporting against the production database. Feasible if there is hardly any load (on reporting nor document repository). For my intended use a separate DB would be required. I started creating a view on top of the DB, but stopped developing it. I do not like to be depending on Alfresco’s changes in the DB, they own the database schema, not me. Even ignoring any performance penalties.
Filling the reporting database
The solution ran nice, but when tested against a decent-size repository I ran into the problem that a Lucene query never returns more than 1000 objects. A particular Alfresco instance out there can be configured to a higher number, but it is limited anyway. I solved this by doing a search, sort on sys:node-dbid, and search again with the restriction that the sys:node-dbid should be greater that the last one I spotted in the previous result set. Until there is no result anymore, then I got them all. This can of course be extended with the limitation to find objects only with a created/modified timestamp bigger than the last successful search query.
The result is pushed into a Java class that opens a JDBC connection to a configurable database of choice. The super set of result parameters and their type are matched against the description of the existing table. If a property is not existing in the database, I alter the table and add a column. If all columns are validated, I add the batch of results into the reporting database.
Generating reports against the database
With this tool anyone with some imagination and SQL skills can get information from their Alfresco repository. Some queries I would be interested in:
- What sites are heavily used?
- What sites are used by many users? What sites are not? Are there sites without content (–> pollution) or with just one author?
- Are there Sites where content actually is modified? (Are there back doors in use?)
- Who are my champion users, who hardly contribute? (–> who to compliment, who to encourage?)
- What Sites have not had any activity for a period of time? (–> should action be taken?)
- What is the main use of the repository/Site? Documents, Wiki, Datalists etc? (–> and target training to a set of users)
- Measure a particular parameter before and after any (organizational) change, and have an indication of success.
Enhancements before any release
- Allow sub-reports in JasperReport report generation.
- Re-do & refactor some parts of the Java Implementation.
- Externalize namespace translation into property file or alfresco-global.properties to also deal with custom name spaces without Java coding.
- Create some sample reports
- Create some sample scripts for frequently filling the reporting DB
- Create some sample scripts for generating reports (create PDF/DOC/XSL/HTML as new object (unique name with timestamp, as new version, or just replacing the content element of the object.
- Include report generation using Pentaho.