This blog is the result of my discoveries in integrating Ephesoft, the open source mailroom automation, and Alfresco, the open source document management solution. Ephesoft is able to export to a CMIS-enabled repository, and Alfresco is the CMIS repository, and both are open source! In this blog I configure a default install of Ephesoft (using the Ephesoft installer) and a default install of Alfresco Community (using the installer). I installed each application on a different VM, I don’t like to make a mess of my laptop, and don’t want to spend time on getting both to run smoothly on a single VM image.
If you mess around too much manually with your batches, delete all work folders (inside ephesoft-system-folder) and switch the variable ‘workflow.deploy’ in file C:\bin\Ephesoft\Application\WEB-INF\classes\META-INF\dcma-workflows\dcma-workflows.properties to true. Restart Ephesoft and the related tables in the database will be cleanly recreated. I needed it, but in my philosophy I learn the boundaries of a system by breaking them…
In short, these are folder-wise the building blocks within the Ephesoft folder:
- Application – contains the java/GWT code of the actual application. Note
- Dependencies – contains helper tools like
- ImageMagick (to transform and scale images),
- Tesseract (to perform OCR) and
- hocr2pdf to construct a pdf from a hocr and a tiff file.
- Documents – links to the online documentation
- JavaAppServer – contains a Tomcat instance
- SharedFolders – contains:
- another-monitored-folder – a (configurable) folder that is being watched for incoming batches
- BC1 (actually BCn, there can be more) – containing all configuration for that version of flow definition (like CMIS binding info)
- ephesoft-system-folder – contains temporary folders for each of the batches containing the temporary html, tiff, xml and png files
- FinalDropFolder is the configurable) location where the end result pdf’s can be dropped
- SampleBatches – contains two sets of demo batches
Configuring the Document Types
Next thing to do is to make Alfresco capable of receiving CMIS Document objects with additional attributes. Lets analyse what the document types are that are provided by the Ephesoft demo.
Each of these document types have the same attributes:
- Invoice Date (Date)
- Part Number (Long)
- Invoice Total (Double)
- State (String)
- City (String)
Edit the file C:\bin\Ephesoft\SharedFolders\BC2\cmis-plugin-mapping\DLF-Attribute-mapping.properties This file contains the mapping of Ephesoft attribtes to CMIS target system attributes. It expects a model with a namespace called “ephesoft”. I modified the Alfresco document type from ephesoft-type (ephesoft:document) back into cm:document . Other than that, I will create an Alfresco model/aspect using the same names.
Application-Checklist=D:ephesoft:document Application-Checklist.PartNumber=ephesoft:partNumber Application-Checklist.InvoiceTotal=ephesoft:invoiceTotal Application-Checklist.InvoiceDate=ephesoft:invoiceDate Application-Checklist.State=ephesoft:state Application-Checklist.City=ephesoft:city Workers-Comp-02=D:ephesoft:document Workers-Comp-02.PartNumber=ephesoft:partNumber Workers-Comp-02.InvoiceTotal=ephesoft:invoiceTotal Workers-Comp-02.InvoiceTotal=ephesoft:invoiceDate Workers-Comp-02.State=ephesoft:state Workers-Comp-02.City=ephesoft:city US-invoice-Data=D:ephesoft:document US-invoice-Data.PartNumber=ephesoft:partNumber US-invoice-Data.InvoiceTotal=ephesoft:invoiceTotal US-invoice-Data.InvoiceTotal=ephesoft:invoiceDate US-invoice-Data.State=ephesoft:state US-invoice-Data.City=ephesoft:city
In Alfresco the model (C:\bin\Alfresco\tomcat\shared\classes\alfresco\extension\ephesoftModel.xml) looks like:
<?xml version="1.0" encoding="UTF-8"?> <!-- Custom Model --> <!-- Note: This model is pre-configured to load at startup of the Repository. So, all custom --> <!-- types and aspects added here will automatically be registered --> <model name="ephesoft:demomodel" xmlns="http://www.alfresco.org/model/dictionary/1.0"> <!-- Optional meta-data about the model --> <description>VLC</description> <author>Tjarda Peelen - VLC</author> <version>0.1</version> <imports> <!-- Import Alfresco Dictionary Definitions --> <import uri="http://www.alfresco.org/model/dictionary/1.0" prefix="d"/> <!-- Import Alfresco Content Domain Model Definitions --> <import uri="http://www.alfresco.org/model/content/1.0" prefix="cm"/> </imports> <!-- Introduction of new namespaces defined by this model --> <!-- NOTE: The following namespace custom.model should be changed to reflect your own namespace --> <namespaces> <namespace uri="com.ephesoft.demo" prefix="ephesoft"/> </namespaces> <constraints /> <types> <type name="ephesoft:document"> <title>ephesoft_scan</title> <parent>cm:content</parent> <properties> <property name="ephesoft:invoiceDate"> <title>Invoice Date</title> <type>d:datetime</type> </property> <property name="ephesoft:partNumber"> <title>Part Number</title> <type>d:long</type> </property> <property name="ephesoft:invoiceTotal"> <title>Invoice Total</title> <type>d:double</type> </property> <property name="ephesoft:state"> <title>State</title> <type>d:text</type> </property> <property name="ephesoft:city"> <title>City</title> <type>d:text</type> </property> </properties> </type> </types> </model>
Yes it is a TYPE instead of an ASPECT. CMIS does not provide for Apects (in that sense, it is a compromise standard, but a good one)…
Register the mode using C:\bin\Alfresco\tomcat\shared\classes\alfresco\extension\custom-model-context.xml with content:
<?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'> <beans> <!-- Registration of new models --> <bean id="extension.dictionaryBootstrap" parent="dictionaryModelBootstrap" depends-on="dictionaryBootstrap"> <property name="models"> <list> <value>alfresco/extension/ephesoftModel.xml</value> </list> </property> </bean> </beans>
And enter for:
CMIS atom-pub url: http://192.168.30.128:8080/alfresco/service/cmis
displayName Part Number
displayName Invoice Total
displayName Invoice Date
3 .Setting up the CMIS connection
The CMIS entrance to your Alfresco repository can be found at http://192.168.30.128:8080/alfresco/service/cmis
Alfresco is shipped with a web based CMIS browser. target your web browser at:
And enter for:
CMIS atom-pub url: http://192.168.30.128:8080/alfresco/service/cmis
Now, pay attention, you need your Repository ID, the large identifier on top, build of groups and separated by minus signs.
Navigate in Ephesoft to http://localhost:8080/dcma/BatchClassManagement.html. Select the batch class BI2 with description “Tesseract Mail Room”, and select “edit”
Select the “Export module” from the module list, and select “Edit”
Select the “CMIS Export” plugin, and select “Edit”
Edit the plugin configuration
And select Save. Congratulations, you just configured your CMIS end point. If you navigate back to the main page of the admin console you can notice that the version number of the batch has increased from 126.96.36.199 to 188.8.131.52.
Propagate a batch through Ephesoft (in the user UI, http://localhost:8080/dcma/BatchList.html).
You will notice the folders do get created in Alfresco (/EphesoftFinalDropFolder/BI01 or something similar) but your document will not arrive… In the Ephesoft logging you will see complaints that an Integer or Decimal is expected. CMIS does know about Integer and Decimal, but not about Long and Double. It is my assumption that on Ephesoft side this mapping goes wrong (if it is mapped at all) I have not had the time to investigate yet.
a) The solution is simple… Remove 3 properties from each document type from your mapping file in Ephesoft (C:\bin\Ephesoft\SharedFolders\BC2\cmis-plugin-mapping\DLF-Attribute-mapping.properties) :
- ephesoft:partNumber (because: Long)
- ephesoft:invoiceDate (because: Date –> Strange, don’t know why this one fails)
- ephesoft:invoiceTotal (because: Double)
Retry pushing a batch through Ephesoft flow, and find out that it actually works (but the 3 properties removed remain empty, of course).
b) update your CMIS mapping to map against a plain CMIS document type (in C:\bin\Ephesoft\SharedFolders\BC2\cmis-plugin-mapping\DLF-Attribute-mapping.properties) into:
Application-Checklist=cmis:document Workers-Comp-02=cmis:document US-invoice-Data=cmis:document
Remind, NOT D:cmis:document, remove the D:!!
Retry pushing a batch through Ephesoft flow, and find out that it actually works, and a default Alfresco document is created!
Ephesoft commented on this issue. They successfully tested CMIS types datetime, int and string. Actually, they map:
invoiceTotal --> d:int in Alfresco partNumber --> d:text in Alfresco invoiceDate --> d:datetime in Alfresco
I can see the pragmatic approach to map a Long onto a cmis:String/d:text. However, it does was kind of a surprise. Not really sure if it is a nice solution or a void in the CMIS spec’s. Unexpected it was.
Just as unexpected is the mapping of the invoiceTotal (a Double in Ephesoft) onto a Integer in CMIS/Alfresco. This challenges to test with values just bigger than in int, and see what happens…
I have not yet tested these new ‘insights’ against my test Alfresco setup. I have been thinking how to deal with reducing more complex types to String values. The native type was more useful to store in a DMS than the String representation of the type (think reporting, decision making based on metadata (rules)).
The CMIS basics work out well. A default cmis:document can be created in a remote repository. However, CMIS is fun especially if you are able to transfer the metadata er well. That was one of the key reasons to use Ephesoft in the first place. I can conclude that using a type in Alfresco, the system is able to recieve the Ephesoft output. There are however some issues with Date, Double and Long, which makes sense since the CMIS specification knows about Integer and Decimal… Where the Date goes wrong is still a question to me. Maybe have to get into the source for that…
I would like to be able to model my metadata in an Aspect rather than a Type in Alfresco. I tried this initially and failed. At this point in time I have to try again, since now I know of the limitation in Long and Double. I cannot remember anymore if this caused the error or something else. On the other hand, I would not be surprised if it takes a little more, considering the CMIS standard is more generic than Alfresco as a repository is able to handle… To be continued!
[[update 30 dec 2010: added the feedback of Ephesoft, and included the screenshots that were missing]]