Name Files with Captured OCR Text
Simple zonal ocr is what its name implies, a simple to setup and use program that uses ocr to capture the text in an area of a document. This captured text is used to move and rename the file. Ideal uses for this product are the automatic filing of internal documents that contain a number, such as work orders, shipping documents, delivery tickets etc.
Simple zonal ocr can utilize the ocr engine found in microsoft office document imaging (modi) or the award winning tesseract engine. The fuzzy logic used in simple zonal ocr was used in a custom application created by edocfile. The results were audited by an independent firm and it was found that 1 out of a thousand documents failed and had to be manually indexed.
The program can validate the captured text with easypatterns. Blank page separation is available for batch processing files. How it works: a multi-page tiff image is pulled from a monitored (hot) folder, it is split into separate files each time a blank page is found then an area of the image is extracted and optical character recognition (ocr) is applied to the area.
Then utilizing fuzzy logic the ocr text is modified to correct for common errors. Such as a '1' being read as a 'i'. When this is completed an easypattern rule is applied to validate the captured ocr text. (easypatterns are similar to regular expressions only very simple to configure).
Once validated it is converted to a pdf if desired and then moved to the output folder and renamed the ocr text. For files that fail the validation process and built in viewer allows for quick manual processing.