MetaTool > Help > Extraction > Keep Characters

060-630 MetaTool Extraction Edit – Keep Characters

MetaTool’s Keep characters rule makes it possible to only keep a set of characters in a word, line or full text. It’s frequently combined with a Find Word or Find Line rule.

The Keep characters rule is very useful when you need to extract a code or number containing some redundant symbols or characters. For example, a Belgian giro code on invoices is written like +++007/7163/60104+++ but we only need the numeric part for further processing of the invoice.

With a Keep characters rule we can specify that we only want to keep the digits and convert +++007/7163/60104+++ to 007716360104.

01 Keep characters – Add Rule

The Keep characters rule is defined in the MetaTool Extract tab.

Press the Add button and select Edit – Keep characters to add the edit rule.

The Keep characters window opens.

02 Keep characters – Setup

In our example we will make use of the CB MetaTool Giro Codes job. This job is automatically installed when you install CaptureBites MetaTool.

From the below image sample we want to extract the Giro Code.

With an Advanced OCR Rule we extract the full text from the bottom part of the page and place the result in a field called Full Text.

The result looks like this:

Then we find the word containing +++ with a Find Word with Words / Mask rule to extract the Giro Code:

Finally, we only want to keep the digits in the Giro Code and use a Keep characters rule. Select the index field to hold the extracted data. In this case we select the index field “Giro Code”.

Optionally enter a description.

03 – Keep characters: type in the characters you want the engine to return. In our case, we only want to find digits.

1) Match case: will make the search Case Sensitive. For example, if the required characters are “ABC0123456789”, it will only return the characters A, B and C in exactly the same case + the digits. Disable the Match case option to find both A, B, C and a, b, c. You can always force the case afterwards to Upper Case for example by using a Format / Change case rule.

The result is a clean numeric Giro Code without the + or / symbols like this: