MetaTool > Help > Extraction > Find Line with Line Number

060-570 MetaTool Extraction – Find Line with Line Number

MetaTool’s Find Line with Line Number makes it possible to find lines by specifying their position in a list of lines. It’s frequently combined with a Find Line with Mask / Words rule and/or a Replace Text rule.

The Find Line with Line Number rule is very useful when you need to extract data from documents that don’t have a fixed format. A classic example is when you need to extract names on invoices or reports. The data is also not always located in the same place, it depends on the document layout of each supplier or client.

You typically define an OCR extraction rule first to hold the full text of a scanned document in an index field we typically call Text Block or Full Text. Next, you would define a Find Line with Mask / Words rule to filter the full text and only keep the relevant lines. Next, you replace a character in that line with a line separator using a Replace Text rule to put the value you are interested in on a separate line.

Finally you would define a Find Line with Line Number rule to extract the actual index value you are interested in.

For example, in the case we describe below, we will extract the inspector’s full name from a Building Inspection Report. We will search for the line containing “Inspected by:“ with the Find Line with Mask / Words rule, then replace the “:” character with a line separator using a Replace Text rule and then extract the 2nd line with the Find Line with Line Number rule to extract the full name of the inspector.

01 Find Line with Line Number – Add Rule

Find Line with Line Number is defined in the MetaTool Extract tab.

Press the Add button and select Find – Line – with Line Number to add the find rule.

The Find Line with Line Number Setup window opens.

060-570_MetaTool_Find Line with Line Number_Setup

Of course this rule needs to be preceeded with some other rules. Below, we explain a complete case how the Find Line with Line Number can be used including all preceeding rules.

02 Find Line with Line Number – Setup

In our example we will make use of the CB MetaTool Keyword Doc Sep job. This job is automatically installed when you install CaptureBites MetaTool.

From below image samples we want to extract the inspector’s full name.

The inspector’s full name has a variable length and can be as simple as “John Doe” or as complex as “Daenerys Stormborn of the House Targaryen, Khaleesi of the Great Grass Sea”. The name is always preceeded with the fixed label “Inspected by:” but floats up and down vertically.

With an Advanced OCR Rule we extract the full text from the first page and place the result in a field called FullTextFirstPage.

The result looks like this:

Next, we use a Find Line with Words / Mask rule to extract the line containing the words: “Inspected by:”.

The rule looks like this:

The result after this rule looks like this:

Next, with a Replace Text rule, we split the line containing the inspector’s name in multiple lines by replacing the “:” and “License” with a line separator. This forces the complete name to the second line.

The rule looks like this:

The result after this rule looks like this:

Finally, we extract the Inspector’s name using the Find Line with Line Number rule to select the 2nd line. Select the index field to hold the extracted data. In this case we select the index field “Inspector”.

Optionally enter a description.

The final result after this rule will look like this:

Thanks to this approach it is unimportant whether the name consists of 2, 3 or more elements

For example, assume the name looks like this:

The rules described above would still correctly extract the inspector’s complete name: “Daenerys Stormborn of the House Targaryen, Khaleesi of the Great Grass Sea”
03 – Source field: select the source index field. This is typically the index field containing the text that you want to parse to find the word that you are looking for.

In our example this is the Inspector index field itself, because it already contains the correct lines extracted from the FullTextFirstPage index value we ran through our first 2 rules, the Find Line with Mask / Words rule and the Replace Text rule.

04 – Line number(s): type the line numbers or ranges separated by commas. Negative numbers identify lines starting from the end. In our example case, we need to extract the 2nd line.

Line ranges are defined with a hyphen (-), ex. 1-3.

You can combine line ranges and line numbers with commas, ex. -1, 2-3.

More examples:

1-3: a range that selects the first until the 3rd line.

3: selects the 3rd line.

1,3 or 3,1: selects the first and 3rd line. Both return the same result, the original sequence is always preserved.

-1,2: selects the last line and 2nd line.

-1-2: a range that selects the last line until the 2nd.

-3- -1: a range that selects the 3rd last line until the last line.

Info_Tip_GraphicTip: Hover over the information icon next to any of the options to read useful information how to use the option
05 – Append to original value: the result will be added to the value that is already in the index field. It will otherwise overwrite the result.
06 – Clear original value if result is blank: if the rule does not result in any value, the selected index field will be cleared.
07 – Delete duplicates: this will delete all duplicate matches and the result will only return unique values.

Match case: will make the search for duplicates Case Sensitive. If we’re looking for duplicates of “Wallace D Cosare”, for example, it will only delete duplicates that are in exactly the same case. Disable the option to also delete “wallace d cosare” or “WALLACE D COSARE”.