DRAFT: Ephesoft on Ubuntu Linux

Ephesoft is the ‘open source mailroom automation’ tool. The Community Edition comes with a full blown Windows installer, but being open source minded, and given the fact that it is a Java Web application depeding on some helper applications (like Tesseract, ImageMagick and hOCR), this should be doable… Right?

I you are looking for a successful recipe how to get Ephesoft running on Linux… Beware. This blog will get you half way there (at point of initial writing). I got stuck when Ephesoft calls for Tesseract to do the actual OCR. Once I get this fixed, I will continue. However, since this is my after-work-hours project, time prevented me for actually going all the way at this time. I do think the pointers so far could help you out. This ‘manual’ is incomplete, but might be a nice starter…

This blog post start with the assumption that you already have Tesseract (with Leptonica) running, as described in my previous post “Alfresco using Tessaract OCR on Ubuntu Linux” (and the same is true for ImageMagick). In this post we tweak Ephesoft in order to make it work on Linux. Remind, Ephesoft will run in its own Tomcat instance (not in Alfresco’s Tomcat instance!) If you want to run them simultaniously, beware for the port conflicts, not just http port 8080, but all ajp and management ports as well!

Lets see how far we can get…

First, create a folder opt/ephesoft, and make it own by the current user (I have user vlc):

mkdir /opt/ephesoft
sudo chown /opt/ephesoft vlc

The second assumption is that you have installed the community Ephesoft on some windows box. If it was installed on windows in \ephesoft\ (e.g. the folder \ephesoft\Application exists), then copy the content of the folder ‘ephesoft’. (I usually use winscp for this, to pcopy from Windows to Linux.) You should end up with the similar folder structure as before, in other words something like /opt/ephesoft/Application should exist.

All scripts are not executable by default, so we have to change this:

sudo chmod -R ug+x *.sh /opt/ephesoft/JavaAppServer/bin
Now there are some files to update. First of all, update startup.sh and add some path variables somewhere in the beginning:
export TESSDATA_PREFIX=/usr/local/share/tessdata
export TESSERACT_PATH=/usr/local/bin


Next update the following files and fix the path definitions:
  • dcma-util/dcma-backup-services.properties
    • replace \\ by /
    • replace C:/bin by /opt(and make suse your 'ephesoft' is in the right casing (upper or lower) since Linux is case aware)
  • dcma-batch/dcma-batch.properties
    • replace \\ by /
    • replace C:/bin by /opt (and make suse your 'ephesoft' is in the right casing (upper or lower) since Linux is case aware)
There is some explicit path definition in de sql statements to create the database as well. Lets fix this. 
  • edit /opt/ephesoft/Dependencies/MySQLSetup/ephesoft-mysql-config.sql
    • remove the last line. (I use WinSCP for this)
  • edit /opt/ephesoft/Dependencies/MySQLSetup/dcma-db-eng.sql
    • Search for the string SharedFolders. There should be 3 occurences. Modify the path string to match the new folder (1x to monotired-folder in table batch_class, and 2x to final-drop-folder in table  batch_class_plugin_config).
  • import ephesoft-mysql-config.sql into your MySQL:
    • mysql -u root -p </opt/ephesoft/Dependencies/MySQLSetup/ephesoft-mysql-config.sql
  • import dcma-db-eng.sql into your MySQL, indocate you use database ‘ ephesoft’ !:
    • mysql -u root -p ephesoft </opt/ephesoft/Dependencies/MySQLSetup/dcma-db-eng.sql
The Ephesoft system needs to know if a new batch of tiff images is available to process. Therefore the JNotify library is used. This library contains of an operatin system component, and Java library. The Windows installer was responsible for getting your Windows system configured with the right dll's, but you have to do this yourself in Linux. And it is easy, the library is already provided by Ephesoft. 
sudo cp /opt/ephesoft/JavaAppServer/bin/libjnotify.so /usr/lib
sudo ldconf

That should be about it…

I can see all html pages of the application. I can even have Ephesoft generate thumbnails and scaled images of the tiff files. However,  the application is not able to start Tesseract to generate ocr files. It seems it is still trying to execute TesseractConsole.exe as shipped with the Community edition. However, this is Windows-speak. And we are running Linux. I suspect this command to be somewhere in the code, I cannot easily find a  properties or other config file to reconfigure the tesseract command. I tried the SVN at GoogleCode (http://code.google.com/p/ephesoft/source/checkout), but  it appears a bit empty… (The code will appear soon, somewhere at Ephesoft website, or at google code…)

Advertisements

21 Responses to “DRAFT: Ephesoft on Ubuntu Linux”


  1. 1 Dave April 7, 2011 at 18:01

    Hi:

    I’m going to give this a go as well. I’ll post here if I find anything.

    Cheers,
    Dave

  2. 2 Dave April 7, 2011 at 22:07

    I had to do a couple more things using Ubuntu Server 10.04 LTS (64 bit):

    1) Get libjnotify64.so from here: http://minibuilder.googlecode.com/files/libjnotify64.so

    2) Copy it to /usr/lib and symlink to /usr/lib/libjnotify.so

    3) Added: export IM4JAVA_TOOLPATH=/opt/ephesoft/Application/WEB-INF/lib to startup.sh in Tomcat

    4) Also edited: /opt/ephesoft/JavaAppServer/conf/Catalina/localhost/dcma.xml and dcma-batches.xml – fix path definitions as outlined above.

    D.

    • 3 tpeelen April 8, 2011 at 09:15

      Thanks for the add-on Dave!
      I was doing several things at the same time (and writing/rearranging 4 blogs) and that caused me missing out some of the details…
      Meaningful addition, great!

      Tjarda

  3. 4 Dave April 8, 2011 at 17:02

    No problem, glad to help. This article was extremely helpful to me. Since I’m new to Ephesoft, could you outline the steps you took to re-create the Tesseract problem?

    • 5 tpeelen April 8, 2011 at 17:37

      The issue was that I was unable to find where Tesseract was called. It seems to be a hardcoded call to the tesseractconsole.exe. I would have expected a configurable command line. But I did not investigate further. A decent search should be sufficient…

      Being an Ephesoft Partner, I spend most of my time with the Enterprise version, where Recostar OCR is available. That gives me a better quality OCR’d content. Happy collect your findings and have all information bundled! Thanks!

  4. 6 Dave April 8, 2011 at 18:07

    OK, found it. It’s in: /opt/ephesoft/Application/WEB-INF/lib/dcma-tesseract-0.0.9.jar

    In: dcma-tesseract-0.0.9/META-INF/dcma-tesseract/tesseract-reader.properties

    Contents:
    #mandatory commands for ocropus
    tesseract.commands_mandatory=cmd;/c;TesseractConsole.exe

    I take it we remove /c;TesseractConsole.exe and replace with maybe: sh tesseract or sh /usr/local/bin/tesseract?

    Any thoughts?

    • 7 tpeelen April 8, 2011 at 21:13

      Dave.

      Good work!
      I think it is just trial-and-error. But a brief look how Java deals with executing a system command. I guess it translates to something like >cmd /c TesseractConsole.exe (the ‘;’ appears to be a tokanizer!), so replacing this by the Linux equivalent should do. Should be solvable!

      Tjarda

  5. 8 Dave April 8, 2011 at 22:00

    Thnx – I’m going to try: /usr/local/bin/./tesseract

    Now I have a real dumb question, how do I load up a file and scan it 🙂 I checked out a few Ephesoft YouTube vids but it still wasn’t very apparent how to do it. I tried logging in on my server at http://204.225.175.54/dcma/BatchList.html and I click on the “WebScanner” tab and just get a blank screen. I’m going to hit the manuals shortly but if you know some quick and dirty steps I can take to test the tesseract command it would be helpful.
    Cheers,
    Dave

  6. 9 Dave Cook April 11, 2011 at 01:47

    I now get the following message in the logs:

    org.springframework.scheduling.quartz.JobMethodInvocationFailedException: Invocation of method ‘pickupBatchInstance’ on target class [class $Proxy102] failed; nested exception is org.jbpm.api.JbpmException: no process definition with key ‘TesseractMailRoom’

    This message appears in catalina.out when I copy Sample Batches into /SharedFolders/another-monitored-folder. I’m pretty sure it’s related to the tesseract manadatory commands not being syntactically correct. Still trying….

    • 10 tpeelen April 11, 2011 at 10:13

      I doubt if this is Tesseract related. It appears that Epehsoft cannot pick up the batch from the monitored folder.
      1) Do you copy a folder of tiff’s into the monitored folder? (NOT just tiff’s!)

  7. 11 Dave Cook April 11, 2011 at 14:51

    Yes, it was a folder of tiffs – ClassificationSample-1 from the /Sample Batches directory. So /SharedFolders/another-monitoried-folder/ClassificationSamlpe-1/ I’ll try dumping just tiffs straight under /SharedFolders/another-monitored-folder. So there should be no sub directories under any of the monitored directories?

  8. 12 Dave Cook April 11, 2011 at 15:39

    Same error when dropping tiffs straight under /SharedFolders/another-monitored-folder. It looks as though the system isn’t picking up files when they’re dropped in monitored folders.

  9. 14 Dave Cook April 13, 2011 at 16:20

    I’m going back to basics. I’ve downloaded the community source and maven dependencies and I’m trying to build it first on Windows and then port the project over to Linux. I’ve got 0.10 running under a NetBeans project on Windows. I’m going to attempt to move the project to Linux and build there. At least this way we’ll have the ability to properly step through the code to see what’s going on.
    D.

  10. 15 Hassan June 18, 2011 at 19:35

    Hello Dave, How did it go? Am interested. Thanks tpeelen for the tutorial.

  11. 16 Dave Cook June 18, 2011 at 22:19

    Hi Haasan:

    It turned out to be quite difficult to accomplish. However, I have a Java developer in Turkey taking on the task again.

    Cheers,
    Dave

  12. 17 Jan Čustović July 19, 2011 at 10:36

    Did someone manage to find a solution to: org.jbpm.api.JbpmException: no process definition with key ‘TesseractMailRoom’?

    I did a clean install from exe file and this happens when I copy ClassificationSample-1 to another-monitored-folder.

  13. 20 dhartford December 4, 2011 at 21:35

    Although this is a bit dated and late to this blog entry, in case anyone else still looking at this specific ephesoft/linux setup:

    http://www.ephesoft.com/forums/viewtopic.php?f=7&t=115


  1. 1 Open Source scanning with Ephesoft and Alfresco « Open Source ECM/WCM Trackback on January 16, 2011 at 14:41
Comments are currently closed.