060-530 MetaTool Extraction – OCR (extra languages) Rule

MetaTool’s Advanced OCR rule is recommended to handle traditional Western languages very accurately and very fast. However, when you have other languages using a different character set such as Russian (Cyrillic) or Arabic, you should use the OCR (extra languages) rule.

01 OCR (extra languages) – Add Rule

OCR (extra languages) is defined in the MetaTool Extract tab.

Press the Add button and select Zonal Extraction / OCR (extra languages) to add the extraction rule.

The OCR (extra languages) Setup window opens.

Select the index field to hold the extracted data.

Next, select the zone you would like to extract from. The zone can be full page, top/bottom half or a custom zone specified with the lasso tool.
In this example we want to read the whole page, so we select the Full Page Zone.

After this, we’ll adjust the OCR settings.

02 OCR (extra languages) – OCR Settings

03 – On Page(s): sometimes the information is on another page than page 1. With this option, you can exactly define which page to extract data from.
04 – First document only: only reads the pages of the first document.
05 – Fast mode: when you have high quality documents (printed with laser printer or equivalent) scanned in 300 DPI, it is recommended to enable this option. However, when dealing with faxes, low resolution or small font-sizes it is better to disable Fast mode. When disabled, the OCR engine will work slower, but will more accurately detect deformed or very small text.
06 – Align Zone: when documents in a batch are of varying sizes or mixed orientations (portrait and landscape mixed together), you can align your OCR zone in relation to any of the 4 corners of the image: the top left or right corner or the bottom left or right corner. That way the OCR zone will be positioned correctly on all sizes and orientations.
Bottom right alignment of an OCR zone on a portrait oriented image
Bottom right alignment of the same OCR on a landscape oriented image
07 – Append to original value: the result will be added to the value that was already in the index field. Disable this option to overwrite the previous value with the new.

08 – Languages: this setting enables the character set and dictionary of the selected languages. It is advised to only select the languages that are present on the documents.

In the example below, the Russian Language Setting is disabled. You can see that the returned test result contains weird characters that don’t match with the original text from the document.

Russian Language Setting disabled
The correct result is returned when the Russian language setting is selected.
Russian Language Setting enabled