MetaTool > Help > Extraction > Format Arabic

060-750 MetaTool Format – Format Arabic

With the MetaTool Format Arabic rule, you can extract numeric data from Arabic text and preserve the correct sequence of groups of numbers separated by spaces, hyphens, slashes etc. Number groups often occur in telephone numbers like in +971-50-555555.  If these number groups are mixed with right to left Arabic text, the groups are reversed in sequence after extraction. You can use the Format Arabic rule to fix this.

01 Format Arabic – Add Rule

Format Arabic is defined in the MetaTool Extract tab.

Press the Add button and select Format – Arabic to add the format rule.

The Format Arabic Setup window opens.

02 Format Arabic – Setup

We want to extract the telephone number from the document below.

First, we extract the full page with an OCR (extra languages) rule and place it in an index field called Full Text. With a Replace text rule, we get rid of the leading +.

The result looks like this:

Visually, the result appears to be correct, but when trying to extract the telephone number, the sequence of the number groups is reversed.

To fix this, we convert the text to “left to right” with a Format Arabic rule. Select the index field that holds the Arabic text. In this case, we select the index field “Full Text”.

Optionally, we enter a description.

Finally, we add a Find Word with Mask / Words rule to only extract the telephone number. The result after this rule will clearly demonstrate what the Format Arabic rule does.

The result without Format Arabic rule, the telephone number’s number groups are reversed:

The result with Format Arabic rule, the telephone number’s number groups are in the correct sequence: