MetaTool > Help > Extraction > OCR (extra languages)

060-590 MetaTool Extraction – Find Number

MetaTool’s Find Number makes it possible to find numbers in documents such as a Total Amount or quantity. It’s frequently combined with a Remove Characters rule and a Replace Text rule.

The Find Number rule is very useful when you need to extract a number from documents that don’t have a fixed format. A classic example is when you need to extract the total amount on invoices. The data is also not always located in the same place, it depends on the invoice layout of each supplier.

For example to extract the Total Amount on an invoice, you first define an OCR extraction rule to hold the full text or part of the text of a scanned document in an index field we typically call Text Block or Full Text. We then use a Replace Text rule to detach the amounts from currency symbols such as $ or € and remove any redundant spaces between digits with the Remove Characters rule. Both these rules are actually used to clean up the OCR text and make it ready to extract the amount.

Finally you would define a Find Number rule to extract the actual Total Amount which is the highest amount in cleaned-up text block.

In our use case below we use a set of French invoices to explain how the rules work.

01 Find Number – Add Rule

Find Number is defined in the MetaTool Extract tab.

Press the Add button and select Find – Number to add the find rule.

The Find Number Setup window opens.

02 Find Number – Setup

In our example we will make use of the CB MetaTool Factures job. This job is automatically installed when you install CaptureBites MetaTool.

From below image samples we want to extract the Total Amount which is the highest amount in the right bottom corner of each invoice.

First we set up an Advanced OCR rule to only extract the right corner of each document:
The result after this rule looks like this:

Then we remove the spaces around thousand separators and decimal points (“.” and “,”) using a Replace Text rule.
The rule looks like this:

The result after this rule looks like this:

Next we remove the spaces between digits using the Remove Characters rule.
The rule looks like this:

The result after this rule looks like this:

Finally, we extract the Total Amount with the Find Number rule. Select the index field to hold the extracted data.

In this case we select the index field “Montant Total” (Total Amount in French).

Optionally enter a description.

The final result after this rule will look like this:
03 – Source field: select the source index field. This is typically the index field containing the text that you want to parse to find the number that you are looking for. In our example this is the Montant Total index field itself, because it already contains the clean version of the full text generated by previous rules.

Match whole word: will only return numbers that are not connected to any other words. This means if the number is connected to other characters, for example: 500KG, 500 will not be found. If it would be written like 500 KG, 500 becomes a whole word and it would be returned. Disable match whole word to find numbers attached to other words.

For example, with the “Match whole word” option disabled a Find Number rule would find 500 in the word WEIGHT500KG

04 – Value: you can specify which numbers you want to keep, there are 6 options:

1) Keep all matches: this will return all numbers.

For example:

From this source:

It will return this:

2) Keep first match: this will return the first number and will skip all following numbers.

For example:

From this source:

It will return this:

3) Keep last match: this will return the last number and will skip all other numbers.

For example:

From this source:

It will return this:

4) Keep highest: this will return the highest number.

For example:

From this source:

It will return this:

5) Keep lowest: this will return the lowest number and will ignore every higher number.

For example:

From this source:

It will return this:

6) Keep highest positive or lowest negative: this will return the highest absolute value (the highest number regardless of its sign). So, in other words, it will keep the highest positive number or lowest negative number.

For example: -5 and 4 have an absolute value of 5 and 4. It will return -5, the lowest negative number or highest absolute value.

05 – Append to original value: the result will be added to the value that is already in the index field. It will otherwise overwrite the result.
06 – Clear original value if result is blank: if the rule does not result in any value, the selected index field will be cleared.
07 – Delete duplicates: this will delete all duplicate matches and the result will only return unique values.
08 – Decimal symbol: insert the decimal symbol(s) of the number(s) you want to find. The most frequently used symbols are “,” in Europe & Latin America and “.” in US, Canada, United Kingdom, South Africa, Australia, etc. You can also specify multiple decimal symbols to locate amounts on a mix of documents from different regions.

09 – Digits after decimal: by inserting a minimum and maximum value, it will only return numbers with the given number of digits after the decimal symbol.

Examples:

Only returns numbers with 2 digits after the decimal symbol, nothing more and nothing less.
Returns numbers with a range from 2 to 5 digits after the decimal symbol.
Only returns numbers with no digits after the decimal symbol.

10 – Range: this gives you the possibility to decide whether you would like the number to be in a certain range or not.

For example, if the number is never above 5000 and never negative, you can give it a range from 0 to 5000:

Select All if your number is not in a specific range.