MetaTool > Help > Extraction

060 MetaTool Extraction

The MetaTool setup opens the Extraction tab by default and provides access to the core feature set of MetaTool.

TIP: The thumbnail on the right will follow you to easily refer to the Setup window. Click on the thumbnail to make the image larger.

Here you can define a sequence of rules to extract and format data.

A basic set of rules could be:

  1. Extract the full text of page 1 using OCR
  2. Find a word in the full text matching a specific mask. For example, a word matching an account number format.
  3. Find a date on a line containing the words “Invoice Date”.

Typically you define a set of rules for each index field that you want to extract. You can define as many rules as you want.

There are many types of rules and even the most complex extraction processes can be defined. All this by combining easy to configure extraction rules.

Testing your rules is very easy. The viewer in MetaTool setup shows the documents in your current batch. And you can just press the Test button at any moment to try out your rules on any of the documents of your batch. Instantly the Test result is displayed in the results panel.

If you want to watch some short tutorial videos to learn how to setup MetaTool extraction rules, have a look at these videos.

Here is an overview of all rule types. Press the link for a detailed help:

01 Extraction – Setup

To set up your Extraction rules, press the Extract tab if it is not open yet.

The Extraction Setup shows the images of your current batch in the left panel. The green zone indicates the extraction zone of the current rule. In this case, it’s full page.

The middle panel shows your extraction rules. You can define as many rules as required.

The right panel shows the index fields defined in Kofax Express. The Original Value column shows the values already extracted by Kofax Express (for example using its barcode reader). The Processed Value column shows the result of the extraction after pressing the Test button.

01 – Navigation Toolbar:

1) Document buttons: use the green buttons to navigate through the documents in the current batch.

Use the Go to document button to directly navigate to a specific document:

2) Page buttons: use the blue buttons to page through the current document if it has more than one page.

Use the Go to page button to directly navigate to a specific page in the document:

3) Test button: press the Test button to show the result in the “Processed value” column.

Enable the Auto test option to automatically do a test as you go through the documents of your test batch using the green document navivation buttons.

Sometimes the value cannot be displayed completely in the Processed Value column. In that case, you can hover your cursor over the value to see the complete value in a balloon message.
Or you can click on the result and a separate window will pop up, displaying the complete value. You can also search for specific words in the text in this pop-up window using the Find feature.

Searching the full text is useful when you want to search for a value in the full text OCR result that could not be found with the extraction rules. Analyzing the OCR result is often useful to understand why extraction rules fail.

Use the Format case option to switch the text to another case. In uppercase, it is easier to detect OCR errors like l versus I (in uppercase, that would be L versus I).

For example, the Voyage Code FI735R is hard to interpret for the OCR engine. The second position could be a lower case l and upper case I or a pipe character |:

By switching the case, you can detect these kind of problems much easier:

The OCR result showing an I in the Original format
The OCR result in Uppercase format showing that the I was actually interpreted as a lowercase l
After you’ve diagnosed the problem, you could, for example, disable the lowercase l and pipe character from your OCR extraction rule so the OCR engine can only return the uppercase I:
Original
Uppercase, still showing the correct I character
TIP: If you have to design a document and you wonder what the ideal font type is for OCR processing, have a look here.
02 – Add: press the Add button to add an Extraction rule.
03 – Duplicate: press the Duplicate button to copy the selected Extraction rule. The duplicated rule will automatically be added after the selected rule. Next, double-click the duplicated rule or press Modify to adjust it.

04 – Modify: press the Modify button to open the Setup window of the currently selected rule. You can do the same by double-clicking the rule itself.

05 – Test up to selected rule: press this button to test up to a selected rule. This is useful if your rules don’t generate the desired result, you typically would test the rules step by step to find the issue and “debug” your rules.

06 – Move up / Move down: press the Move up or Move down button to change the order of the Extraction rules.

The order of the rules influences the result. For example, if you want to format a date, you first need to extract the date with Advanced OCR and Find Date rules. The Format Date rule needs to occur after the extraction of the date.

07 – Delete: press the Delete button to remove the selected Extraction rule.

02 The Viewer – Controls & Shortcuts

The viewer control is used in many MetaTool screens. The buttons and shortcuts to zoom and pan the image works in the same way across the product.

Below video demonstrates how to zoom and pan images in the MetaTool viewer:

03 MetaTool Extraction Tutorial Videos

03-01 Fixed zone data extraction

Defining validation rules starts at minute 3:38

03-02 Floating data extraction

03-03 Extraction of a number on a noisy background using color drop out

Exporting valid & invalid documents separately (starts at minute 4:53)