MetaTool > Help > Extraction > Remove Text

060-650 MetaTool Extraction Edit – Remove Text

With the MetaTool Remove text rule, you can remove specific text. It’s frequently combined with a Find Line with Mask / Words rule.

The Remove text rule is very useful when you need to extract particular words, names, codes or numbers from documents that contain redundant text. For example, names are often written after a label like in: “Full name: Alfred Pennyworth”. However, we only need the name and drop the label for further processing of the invoice.

You first define an OCR extraction rule first to hold the full text of a scanned document in an index field we typically call Text Block or Full Text. Next, you would define a Find Line with Mask / Words rule to filter the full text and only keep the relevant lines. Next, you remove the unwanted text in that line with the Remove text rule.

For example, to extract the Inspector’s name from a Report, you can search for the line containing “Inspected“ with the Find Line with Mask / Words rule and then remove the “Inspected by: ” text using the Remove text rule.

01 Remove text- Add Rule

Remove text is defined in the MetaTool Extract tab.

Press the Add button and select Edit – Remove text to add the edit rule.

The Remove text window opens.

02 Remove text – Setup

In our example, we will make use of the CB MetaTool Keyword Doc Sep job. This job is automatically installed when you install CaptureBites MetaTool.

We will use these image samples and we want to extract the Inspector from the bottom left corner.

With an Advanced OCR Rule we extract the right bottom text from the first page and place the result in a field called FullTextFirstPage.

The result looks like this:

Next, we find the line with a Find Line with Words / Mask rule to extract the line containing the words: “Inspected by:”.

The result after this rule looks like this:

Finally, we only keep the Inspector’s name using the Remove text rule to remove the words “Inspected by:”. Select the index field to hold the extracted data. In this case we select the index field “Inspector”.

Optionally enter a description.

03 – Match whole word: only removes text exactly matching the defined word(s). When disabled, it will also remove the specified text if it’s a part of a word. For example: with “Match whole word” disabled and when removing the word “apple”, it would convert the word “pineapple” to “pine”.  If Match whole word is active, the word “pineapple” would remain untouched and only if the word apple stands on its own, it would be removed.
04 – Match case: only removes text exactly matching the defined word(s) case. When disabled, it will remove the specified text regardless the case.

For example: with Match case disabled and when removing the word “apple”, it would remove “APPLE”, “apple” and “Apple”.  If Match case is active, only the word “apple” would be removed and “APPLE” and “Apple” would remain untouched.

05 – Remove text: enter the text you want the rule to remove.

In our case, we only want to remove the text “Inspected by:“.

06 – Setup of a Remove text entry: By pushing the Setup button, you can select different system and index values to define the text to be removed.
The final result after this rule will look like this:
Thanks to this approach it is unimportant whether the name consists of 2, 3 or more elements

For example, assume the name looks like this:

The rules described above would still correctly extract the inspector’s complete name: “Daenerys Stormborn of the House Targaryen, Khaleesi of the Great Grass Sea”.