Ephesoft is the ‘open source mailroom automation’ tool. The Community Edition comes with a full blown Windows installer, but being open source minded, and given the fact that it is a Java Web application depeding on some helper applications (like Tesseract, ImageMagick and hOCR), this should be doable… Right?
I you are looking for a successful recipe how to get Ephesoft running on Linux… Beware. This blog will get you half way there (at point of initial writing). I got stuck when Ephesoft calls for Tesseract to do the actual OCR. Once I get this fixed, I will continue. However, since this is my after-work-hours project, time prevented me for actually going all the way at this time. I do think the pointers so far could help you out. This ‘manual’ is incomplete, but might be a nice starter…
This blog post start with the assumption that you already have Tesseract (with Leptonica) running, as described in my previous post “Alfresco using Tessaract OCR on Ubuntu Linux” (and the same is true for ImageMagick). In this post we tweak Ephesoft in order to make it work on Linux. Remind, Ephesoft will run in its own Tomcat instance (not in Alfresco’s Tomcat instance!) If you want to run them simultaniously, beware for the port conflicts, not just http port 8080, but all ajp and management ports as well!
Lets see how far we can get…
First, create a folder opt/ephesoft, and make it own by the current user (I have user vlc):
mkdir /opt/ephesoft sudo chown /opt/ephesoft vlc
The second assumption is that you have installed the community Ephesoft on some windows box. If it was installed on windows in \ephesoft\ (e.g. the folder \ephesoft\Application exists), then copy the content of the folder ‘ephesoft’. (I usually use winscp for this, to pcopy from Windows to Linux.) You should end up with the similar folder structure as before, in other words something like /opt/ephesoft/Application should exist.
All scripts are not executable by default, so we have to change this:
sudo chmod -R ug+x *.sh /opt/ephesoft/JavaAppServer/bin Now there are some files to update. First of all, update startup.sh and add some path variables somewhere in the beginning: export TESSDATA_PREFIX=/usr/local/share/tessdata export TESSERACT_PATH=/usr/local/bin Next update the following files and fix the path definitions:
There is some explicit path definition in de sql statements to create the database as well. Lets fix this.
- replace \\ by /
- replace C:/bin by /opt(and make suse your 'ephesoft' is in the right casing (upper or lower) since Linux is case aware)
- replace \\ by /
- replace C:/bin by /opt (and make suse your 'ephesoft' is in the right casing (upper or lower) since Linux is case aware)
- edit /opt/ephesoft/Dependencies/MySQLSetup/ephesoft-mysql-config.sql
- remove the last line. (I use WinSCP for this)
- edit /opt/ephesoft/Dependencies/MySQLSetup/dcma-db-eng.sql
- Search for the string SharedFolders. There should be 3 occurences. Modify the path string to match the new folder (1x to monotired-folder in table batch_class, and 2x to final-drop-folder in table batch_class_plugin_config).
- import ephesoft-mysql-config.sql into your MySQL:
mysql -u root -p </opt/ephesoft/Dependencies/MySQLSetup/ephesoft-mysql-config.sql
- import dcma-db-eng.sql into your MySQL, indocate you use database ‘ ephesoft’ !:
mysql -u root -p ephesoft </opt/ephesoft/Dependencies/MySQLSetup/dcma-db-eng.sql
The Ephesoft system needs to know if a new batch of tiff images is available to process. Therefore the JNotify library is used. This library contains of an operatin system component, and Java library. The Windows installer was responsible for getting your Windows system configured with the right dll's, but you have to do this yourself in Linux. And it is easy, the library is already provided by Ephesoft. sudo cp /opt/ephesoft/JavaAppServer/bin/libjnotify.so /usr/lib sudo ldconf
That should be about it…
I can see all html pages of the application. I can even have Ephesoft generate thumbnails and scaled images of the tiff files. However, the application is not able to start Tesseract to generate ocr files. It seems it is still trying to execute TesseractConsole.exe as shipped with the Community edition. However, this is Windows-speak. And we are running Linux. I suspect this command to be somewhere in the code, I cannot easily find a properties or other config file to reconfigure the tesseract command. I tried the SVN at GoogleCode (http://code.google.com/p/ephesoft/source/checkout), but it appears a bit empty… (The code will appear soon, somewhere at Ephesoft website, or at google code…)