OCR File Splitter – 1.0
OCR File Splitter is a program that is designed to monitor “watch” a file folder for the arrival of a multi-page Tiff image. When one arrives it will be split into smaller multi-page Tiff images based upon the text content of the file or by a fixed number of pages. This makes it ideal for separating incoming facsimiles for further processing or separating a group of files that have been batch scanned with a copier. The program will detect a cover page if it is present and remove it, afterwards it will create a separate file for each of the documents contained within the multi-page tiff image. For instance, if someone were to fax in a group of sales orders, super bills, credit applications etc. (something that has a varying amount of pages) a separate file would be created for each transaction that needs to be processed
How it works:
To determine the beginning of the document the program utilizes the OCR engine in Microsoft Office Document Imaging (MODI) a required component. Once the file is OCR the text is extracted and compared to three lists of text. Contents from one or both lists must be present while no text can be present from a third list in order for the page to be classified as the first page in a document. Each consecutive page is added to the first page until another first page is determined. This process repeats itself until all pages in the multi-page tiff file are processed.