060-640 MetaTool Extraction Edit – Remove Characters
The Remove characters rule is very useful when you need to extract a code or number from documents that contain some redundant symbols or characters. For example, a Belgian giro code on invoices is written like +++007/7163/60104+++ but we only need the numeric part for further processing of the invoice.
01 Remove characters – Add Rule
The Remove characters rule is defined in the MetaTool Extract tab.
Press the Add button and select Edit – Remove characters to add the edit rule.
The Remove characters window opens.
Note: Some remove characters are already specified by default. These characters are characters that cannot be used in a file name also known as “illegal or reserved characters”. More info here.
Remove these characters if you want to use an index field as part of a file name.
02 Remove characters – Setup
In our example we will make use of the CB MetaTool Giro Codes job. This job is automatically installed when you install CaptureBites MetaTool.
From the below image sample we want to extract the Giro Code.
With an Advanced OCR Rule we extract the full text from the bottom part of the page and place the result in a field called Full Text.
The result looks like this:
Then we find the word containing +++ with a Find Word with Words / Mask rule to extract the Giro Code:
Finally, we only want to keep the digits in the Giro Code and use a Remove characters rule. Select the index field to hold the extracted data.
In this case we select the index field “Giro Code”.
Optionally enter a description.
03 – Keep characters: type in the characters you want the engine to remove.
The default remove characters can simply be replaced with other characters.
In our case, we only want to remove the “+” and “/” symbols.
1) Match case: will make the search Case Sensitive. If the required characters are “ABC”, for example, it will only remove the characters A, B and C in exactly the same case. Disable the option to remove both A, B, C and a, b, c.
04 – Remove spaces in numbers: will remove spaces preceding, between or following digits. You can also combine these three different options.
The result is a clean numeric Giro Code without the + or / symbols like this: