120-240 MetaServer Extract – Find Number
With MetaServer’s Find Number rule, you can find specific numbers in your documents, like a total amount or quantity. It’s frequently combined with a Find Word Group with Mask / Words rule, Replace Text and Format Number rule.
NOTE: a simple trick to determine if data is a number or text, is to ask yourself the question: “Does it make sense to add or subtract the data?”. If the answer is yes, it’s a number, otherwise, it’s text.
Numbers can be things like a total amount, number of pages, number of days or a quantity, etc.
Things like a customer number, patient number, invoice number etc. are not numbers, they are text and are typically found using a Find Word or Find Word Group rule.
The Find Number rule is very useful when you need to extract a specific number from documents that don’t have a fixed location. A classic example is when you need to extract the total amount on invoices, which have a lot of different lay-outs depending on each vendor.
In our example, we will make use of the “CB – INVOICES” workflow. This workflow is automatically installed with CaptureBites MetaServer.
We want to extract the total amount from each invoice. The location varies with each vendor, but it’s mostly found in the right bottom corner of the first or last page and it’s the highest number.
You typically define a large Extract Text rule first to focus on the region where the total amount appears (all text of the bottom half of the page).
Next, you define a Find Word Group with Mask / Words rule to only keep the amounts close to labels like Total Amount, Total, Pay This Amount, etc..
Then, you add a Replace text rule to remove spaces around separators and decimal points and correct any OCR substitution errors (O=0 and B=8)
After this, you add a Find Number rule.
Find Number rules are defined in a MetaServer Extract or Separate Document / Process Page action.
To add this rule, press the Add button and select Find -> Number.
TIP: The thumbnail on the right will follow you, so you can easily refer to the Setup window. Click on the thumbnail to make the image larger.
First, add a description to your rule. Then, select a field that holds the erxtracted data. In this case, we select the field “Total Amount”.
01 – Source field: press the drop-down arrow to select the source field. This is the field containing the text you want to parse to find your number.
Match whole word: enable this option to only return numbers that are not connected to any other character or currency symbol.
For example, with “Match whole word” enabled, values like “$50.00”,”100KG” or “220LBS” will not return any results. When disabled, it will return “50.00”, “100” and “220”.
NOTE: only disable this option if you’re sure there won’t be any conflicting number-like text in the source field, like phone numbers, invoice numbers etc.
02 – Decimal symbol: specify the decimal symbol(s) of the number(s) you want to find. The most frequently used symbols are “,” in Europe & Latin America and “.” in US, Canada, United Kingdom, South Africa, Australia, etc. You can also specify multiple decimal symbols to locate amounts on documents from different regions.
03 – Digits after decimal: set the minimum and maximum value to only return numbers with the specified number of digits after the decimal symbol.
-> Only returns numbers with exactly 2 digits after the decimal symbol. This is the typical decimal format of amounts on invoices. By setting this range, you avoid mixing amounts (2 decimals) with quantities (no decimals).
-> Returns numbers with a range from 2 to 3 digits after the decimal symbol.
-> Only returns numbers with no digits after the decimal symbol.
04 – Range: this option allows you to only return numbers within a certain range.
For example, if the number is never greater than 5000 and never below zero (negative), you can set the range from 0 to 5000. All numbers outside this range will be ignored.
05 – Value: choose between 6 options to decide which numbers you want to keep:
1) Keep all matches: return all numbers.
2) Keep first match: only return the first number.
3) Keep last match: only return the last number.
4) Keep highest: only return the highest number.
5) Keep lowest: only return the lowest number.
6) Keep highest positive or lowest negative: in case you need to process a mixture of documents containing positive amounts (for example, invoices) and negative amounts (for example, credit notes), it will keep the highest positive or lowest negative number.
TIP: you can copy the current settings and paste them in another setup window of the same type. Do this by pressing the Settings button in the bottom left of the Setup window and by selecting Copy. Then open another setup window of the same type and select Paste.