Posts Tagged 'OCR'

DRAFT: Ephesoft on Ubuntu Linux

Ephesoft is the ‘open source mailroom automation’ tool. The Community Edition comes with a full blown Windows installer, but being open source minded, and given the fact that it is a Java Web application depeding on some helper applications (like Tesseract, ImageMagick and hOCR), this should be doable… Right?

I you are looking for a successful recipe how to get Ephesoft running on Linux… Beware. This blog will get you half way there (at point of initial writing). I got stuck when Ephesoft calls for Tesseract to do the actual OCR. Once I get this fixed, I will continue. However, since this is my after-work-hours project, time prevented me for actually going all the way at this time. I do think the pointers so far could help you out. This ‘manual’ is incomplete, but might be a nice starter…

Continue reading ‘DRAFT: Ephesoft on Ubuntu Linux’

Open Source scanning with Ephesoft and Alfresco

A document management solution is good in managing ‘content’, control access, process the flow of content, perform transformations, give overview and control. But how does the content enter the system? One stream into the system can be fully digital; integrations with other IT systems, from email, from the office environment. However, there is a world of paper to manage as well. And how does the paper end up in a DMS? Scanning.

Wouldn’t it be great to have a full blown open source stack from scanning, through validation and indexing, pushed into the DMS and managed until it can be destroyed? Now you can, and Ephesoft is the entrance! Continue reading ‘Open Source scanning with Ephesoft and Alfresco’

Alfresco using Tesseract OCR on Ubuntu Linux

In this post I will describe what to download and install to get Tesseract OCR onto an Ubuntu box, and how to integrate it into Alfresco. The goal of this blog is to have Alfresco and a custom transformer that can transform tiff to pdf, where the PDF also has a text layer.

This blog is a setup for the next one, how to combine Ephesoft and Alfresco on one Linux box. Ephesoft needs Tesseract for its OCR functionality Continue reading ‘Alfresco using Tesseract OCR on Ubuntu Linux’