MetaServer > Help > Extract > Extract Text (Azure Form Recognizer)

120-150 Extract – Extract Text (Azure Form Recognizer)

With MetaServer’s Extract Text (Azure Form Recognizer) rule, you can automatically extract header data, key values and line items from forms, like invoices, and store that extracted data in fields. This is done through Azure Form Recognizer’s prebuilt models. These do not require any training or configuration.

It reads machine printed text, cursive handwriting, barcodes and CMC7 text (except for the Other Form model). It’s exceptional in handling inferior quality images like those photographed with a smart phone and smudged or damaged documents.

You can specify the pages where you want to extract information from. After the values have been extracted, you can apply other Extract rules to clean up, format or adjust the values before sending it to the next action.

You can refer to the Azure Form Recognizer’s documentation for a more detailed list of the supported languages for each model.

Some examples (results are below each sample):

US invoice (EN)

German invoice

Spanish invoice

French invoice

Belgian invoice (NL)

UK invoice (EN)

US invoice (EN)

French invoice

US invoice (EN)

US invoice (EN)

NOTE: To validate line items as a table with different columns, you need to merge and format all line items in 1 CSV field using a Format CSV rule. The CSV field would then contain all line items and / or header data in a CSV format.

For example:

5138489C”,”TRAXION MENACE CREW”,”9.00″,”EACH”,”10″,””,”90.00″
5138489D”,”TRAXION MENACE CREW “,”9.00″,” EACH “,”2″,””,”18.00″
5144035″,”STADIUM II BACKPACK”,”30.00″,” EACH “,”1″,”3″,”90.00″
5142723″,”STRIKER II TEAM BACKPACK”,”22.50″,” EACH “,”1″,””,”22.50″

Using a Validate CSV rule, it would look like this in validation:

NOTE: You need to sign up for the Azure Form Recognizer service. Paid plans for the prebuilt models are available for 10$ per 1000 pages (S0 Plan for Prebuilt document types).

 If you only use the Read model, paid plans are available for $ 1.50 per 1000 pages.

There is also a free, 1-year plan (F0) where you can test the engine with prebuilt models up to 500 pages per month for free.

IMPORTANT: The processing speed in the free plan is limited to only 1 call per 2 seconds and only reads 2 pages of the invoice. For the paid plan (S0 plan), the processing speed is 15 calls per second, which is 30 times faster than the free plan and reads all the pages of the invoice.

More detailed information about the pricing can be found here:
https://azure.microsoft.com/en-us/pricing/details/form-recognizer/

For more information on how to apply for a key, please refer to the instructions below.

NOTE: For more technical information about how the Microsoft’s Azure Form Recognizer engine works (API, OCR, etc.) and how they handle Data privacy and security, please refer to the Microsoft Azure Form Recognizer documentation.

Extract Text rules are defined in a MetaServer Extract or Separate Document / Process Page action.

To add this rule, press the Add button and select Extract -> Text (Azure Form Recognizer).

TIP: The thumbnail on the right will follow you, so you can easily refer to the Setup window. Click on the thumbnail to make the image larger.

How to sign up for a key

1) Log in to the Azure portal: https://portal.azure.com/#home

NOTE: If you don’t have a Microsoft or Azure account yet, you can sign up for free. You can find more info here:
https://azure.microsoft.com/en-us/free/

2) Create a resource:

3) In the “AI + Machine Learning” section, create a “Form Recognizer” Resource:

NOTE: If you don’t have an Azure account, you can start one for free for 12 months. Just select “Start Free Trial” and follow the instructions.

You can choose between 2 pricing tiers for the Azure Form Recognizer:

F0 plan (Free): 12 months Free (500 pages/month)

– If you use the invoice model with the free plan, maximum 2 pages for each invoice will be processed.

– The page file size must be less than 4 MB.

– With the 12 month free plan, the Azure server can only handle 1 call every 2 seconds for the recognizer API. Because of this speed limit, we recommend to run extraction and separation only on one core with the free plan.

– Every 28th of the month, the counter is reset to 500 pages.

– When you run out of free calls before the 28th of the month, MetaServer will move documents to the Error tab and will report to wait until the 28th of the month to continue processing documents or to switch to a paid plan.

– Documents that ended up in the errors tab, can be reprocessed with a paid plan or retried when the free counter is reset on the 28th of the month.

– After the 12-month period, you will receive an email from Microsoft one month before the expiration, stating that the 12-month free service is about to expire and will stop working.

You will then need to switch from your free plan to a “pay-as-you-go” S0 plan (see below). You have 30 days to switch from your free plan to a “pay-as-you-go” S0 plan or to stop using the service.

More detailed information about the pricing can be found here:
https://azure.microsoft.com/en-us/pricing/details/form-recognizer/

S0 plan: Pay-as-you-go ($10 per 1000 pages for prebuilt models or $1.50 per 1000 pages for the Read model)

– The page file size can be up to 500 MB.

– The Azure server can handle 15 calls per seconds. You can run Extraction and Separation on multiple cores with a paid plan.

– Microsoft only charges for READ calls. GET calls are free

– For high volumes, starting from > 20.000 pages / month, you can find special pricing here.

– You can pay the subscription with a credit card or request to pay by check or wire transfer here:
https://docs.microsoft.com/en-us/azure/cost-management-billing/manage/pay-by-invoice

5) Press “Review + create” to check your Resource details. If they are correct, press the “Create” button.

6) You can find your resource’s Keys and Endpoint in your Microsoft Azure Dashboard. The “Go to Azure Portal” button will open the portal in your default browser.

The example below shows a resource called “MetaServerAFR-DEV”.

First, add a description to your rule. Then, press the Setup button in the upper toolbar to setup the connection to your Azure Form Recognizer resource.

01 – Key, Endpoint, Location: enter your resource key, endpoint and select your location using the drop-down arrow. You can find this information in your Microsoft Azure Dashboard. The “Go to Azure Portal” button will open in the portal in your default browser.

The example below shows a resource called “MetaServerAFR-DEV”.

In your portal, you can also check your remaining calls. This can be useful to check if you’re not exceeding your current Microsoft Azure Form Recognizer’s pricing tier plan.

If you haven’t signed up for a key yet, please refer to the instructions above.

02 – Proxyif you want to connect to a proxy server, press the Proxy button to open the setup window.

1) Type, Host, User name, Password: press the drop-down arrow to choose your proxy protocol and enter the connection settings to your proxy server. When in doubt, contact your IT department.

2) Port: enter the specified port of your Proxy server. When in doubt, contact your IT department.

03 – API version: here you can specify the API version for the Azure Form Recognizer.

Microsoft regularly releases new API versions. New versions add new functionality, like more supported languages or improved recognition etc.

You can find more details about what is new in Azure Form Recognizer’s current API here:
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/whats-new?view=form-recog-3.0.0&tabs=csharp

The “Invoice” model automates processing of invoices to extract header data like vendor name, vendor tax id, invoice number, invoice date, due date, payment terms, total amount, etc.

The model also extract line items like article codes, unit price, quantity, etc..

This model reads machine printed text, cursive handwriting, barcodes and CMC7 text.

You can find more details regarding pricing in the pricing explanation above.

More detailed information about this model can be found here:
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-invoice?view=form-recog-3.0.0

The “Other Forms” model automatically detects key value pairs for fields, tables and check boxes on any form type. You set one sample of your forms as a Master Form to detect the data elements on the form which can then be mapped with MetaServer fields. This reduces the time to configure the extraction of a form considerably.

The “Other Form” model reads machine printed text, cursive handwriting and barcodes. Unlike the other models, it currently does not read CMC7 text.

You can find more details regarding pricing in the pricing explanation above.

Setup

Step 1) After you have selected the “Other Forms” prebuilt model and finished setting your extract settings, press OK to go back to the mapping setup screen.

You now need to select a sample of your form as a Master Form. After you have added a Master Form, your sample form will be analyzed and all data elements are detected and exposed as field labels and table headers.

Step 2) You then map the detected field labels and table headers that you are interested in with MetaServer fields.

NOTE: Table headers are pre-fixed with “Line Item” to distinguish them from regular, single value fields.

With your new Master Form, you can now test other forms of the same type to check if all the data is detected.

Step 3) If you come across any variations of the same form where different labels are used for the same field, you need to define alternate labels for this field or column.

For example, on the standard sample forms, a column label “NAME” was detected. But on another version of the form, the label is called “NAME (Please Print)”.  You can define “NAME (Please Print)” as an alternate label for NAME.

First, press the “Form Fields” button.

This panel shows all your test form’s labels and values on the left side. The right side shows the Master Form’s labels and values. Select the alternate column label, in this case “Line Item NAME (Please Print)”, on the left side and copy it.

Step 4) To add the alternate field name, press the setup button (…) next to the Form Field name, in this case “Line Item NAME”. Please note that the alternate labels are evaluated in sequence of appearance.

Paste the alternate field name in the list and press OK.

When you test the form now, the alternate label “NAME (Please Print)” is also used to detect the “NAME” column and it is successfully extracted.

The “Receipt” model automatically extracts merchant name, dates, line items, quantities, and totals from printed and handwritten receipts. The version v3.0 also supports single-page hotel receipt processing. The “Receipt” prebuilt model is more limited then the “Invoice” model. In our tests, for European tax receipts we see better results with the invoice model.

This model reads machine printed text, cursive handwriting, barcodes and CMC7 text.

You can find more details regarding pricing in the pricing explanation above.

The preview model is documented here:

Receipt model:
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-receipt?view=form-recog-3.0.0

The “ID Document” model automatically extracts information from ID cards, passports, driver licenses, residence permits and US social security cards. It can also automatically classify the ID document, which is shown in the “Document Type” field.

You can find more details regarding pricing in the pricing explanation above.

More detailed information about this model can be found here:
https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-id-document?view=doc-intel-3.1.0#supported-document-types

The “Read” model automatically extracts the full text. This model limits itself to only extracting the full text, but it often returns better results than the Extract Text (Azure Computer Vision) rule’s engine.

Since it does not include any special extraction logic like the prebuilt models, it is also a cheaper pricing tier. You can find more details regarding pricing in the pricing explanation above.

This model reads machine printed text, cursive handwriting, barcodes and CMC7 text.

The preview model is documented here:

Read model:
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-receipt?view=form-recog-3.0.0

05 – Get result after [x] seconds: by default, this time is set to 6 seconds.

We recommend only changing this to a lower number if you have signed up for Microsoft Azure Form Recognizer’s paid plan. Because the Free plan is limited to 1 call every 2 seconds, we recommend to run extraction with a Free Plan on a single core and keep the “Get result after [x] seconds” at 6 seconds.

06 – Log: enable this option to create a log file each time the Microsoft Azure Form Recognizer engine is called. This option is typically used during diagnosing issues with Microsoft Azure.

On the client side, you can find the log information after running a Test in Extraction in the following folder:
C:\ProgramData\CaptureBites\Programs\Admin\Data\Log

On the server side, after processing some documents, you can find the log information in the following folder:
C:\ProgramData\CaptureBites\Programs\MetaServer\Data\Log

07 – Apply: choose when to apply the rule. The default option is “Always”, which means that the rule is always applied. Press the drop-down arrow to see all other available conditions.

A good example of conditional extraction, is if you first try to extract a value using the Extract Text rule (= standard OCR engine) or Extract Text (Azure Computer Vision) rule but it doesn’t return a valid result. Only then will you let the Extract Text (Azure Form Recognizer) rule try and extract the value.

This speeds up the extraction process and only uses calls to your Microsoft Azure Form Recognizer resource when your first search didn’t return a good result.

After selecting your condition, for example, “If field value is blank”, press the “…” button next to the drop-down arrow to open that condition’s setup window.

1) If value of field: press the drop-down arrow to select the field value that needs to be evaluated.

2) is equal to / is not equal to / is greater than /…: enter the other value your field value needs to be compared with. You can also press the drop-down button to select different system and index values to compose your value.

08 – Page: set the page number to where the information is located. The default is set to all pages (= blank).

For example:
– Enter 1 for the 1st page
– Enter -1 for the last page
– Enter 1-3 to extract from page 1 to page 3.
– Leave this empty in case you want to extract all pages (same as 1–1)
– Etc.

You can also press the drop-down arrow to use a field value containing a page number value to switch the page number(s) dynamically.

Example use-case: A form contains 10 pages. You are interested in extracting the information on the page about the “Bank Details” with a big heading “SECTION 5: BANK DETAILS”. The form’s pages are not always in the correct sequence, meaning that the bank details can be on any of the 10 pages. You first use a Find Word with Mask / Words rule to find the words “SECTION 5: BANK DETAILS” and put the found words in a field called, for example, “KEYWORD”. If the keyword is found on page 7, then the variable { Page Number, KEYWORD } would return the value “7”.

You can then use the field “KEYWORD” as your Page in your Extract Text (Azure Form Recognizer) rule.

NOTE:  if you use a variable as your Page value, you can specify a Test page, solely for testing purposes.

NOTE: If a document does not contain a specific page, it is ignored. For example, extracting page “2,3” on a 2-page document will only extract page 2.

The free F0 plan in combination with the invoice model, will only process a maximum number of 2 pages for each invoice. With the page range you can determine what those pages are. For example 1,-1 would send the first and last page of each invoice.

01 – Deskew & Rotate: if your documents are skewed or rotated incorrectly, you can enable the Deskew and/or Rotate option to optimize Text extraction. It will also result in a corrected version of the page(s).

02 – Color Dropout: if your documents contains a lot of colored tables, lines or stamps throughout the values you want to extract, you can enable the the color dropout feature, which allows you to select up to 3 dropout colors.

Press the setup button to specify which colors to drop out.

Each selected color has its tolerance. With the Test button, you can see the effect after dropping out the selected colors in the right preview windows.

To reset all dropout colors to white (off), you can use the “Reset All Colors” button.

NOTE: The filtered image is only used temporarily to improve text extraction. The processed image keeps all the original colors.

03 – Confidence: characters with a confidence level lower than the set confidence level, will be ignored and not returned in the result. If set to 0, all characters are accepted.

For legacy reasons, this setting is retained. We recommend to start using the new Check if confidence is lower than option in the validate rules.

To help you in defining the correct confidence level, you can check the confidence level of each field in the “Confidence” column of your test results.

NOTE: The “Confidence” value shown beneath “OCR” refers to the confidence level of the Azure Form Recognizer’s “Document Content” field value, which is currently mapped to the “Full Text” value in our example.

04 – Tabs: by default, lines are segmented in multiple word groups that are separated by tabs. If you want no tabs at all, press the drop-down button to select the “Remove” option and all the words will be grouped as 1 single word group for each line.

05 – Convert page(s) to searchable PDF: enable this option if you want to save the extracted text as a searchable text layer in the processed PDF. It will only do this for the page(s) set in the Page(s) field. Leaving the Page(s) field empty will convert all pages of the documents.

As a result, you will be able to search handwritten, arabic, cyrillic or low-quality text in your exported PDF:

This high-quality text layer can also be used during Validation with the Select text tool. To do this, please make sure you also enable the “Use searchable text layer if present” option in the Select text tool setup:

06 – Searchable barcodes: enable this option when you also want to be able to search for barcode values when you convert page(s) to searchable PDF.

01 – Field: The “Field” column shows your MetaServer fields. You can map them to the corresponding Form Recognizer field values using the drop-down arrow in the “Form Field” column (see below).

02 – Form Field: based on the selected Azure Form Recognizer Model (Invoice, Receipt, Other Form, Read), you can map your MetaServer fields to the model’s extracted field values.

NOTE: As you will see in the output, some field values, like dates and amounts, are automatically reformatted by the Azure Form Recognizer engine. This is to output consistent, standardized formats, regardless of the input.

Dates are formatted to YYYY-MM-DD format.

For example:

12/06/2023

becomes:

2023-06-12 (on a European invoice)

2023-12-06 (on a US invoice)

Amounts are formatted without thousand separators, they have a period (.) as a decimal character and the number of digits after decimal will be up to the available number of digits, except for the Total amounts of the header data where it will round it to 2 digits.

For general amounts:

*.00 will become *.0

*.0000 wil become *.0

*.20 will become *.2

*.1254 will remain *.1254

 

For Total amounts of header data:

*.00 will become *.0

*.0000 wil become *.00

*.20 will become *.2

*.1254 will become *.13

 

The currency is also removed from all types of amounts.

For example:

£20,432.625

becomes:

20432.2

 

Time values are formatted to a HH:mm:ss format.

For example:

07:45 PM

becomes:

19:45:00

If the format needs to be different for your output, you can change it for each field by using the Extract action’s Format and Edit rules.

All models (= prebuilt and Read models) can return the following form field values:

 

Name Type Description Example Processed Value
Document Content string

The complete extracted text, including printed and handwritten text, and barcode values (if present), without any formatting. This can be useful if you want to extract any other values that the model was not able to find.

You would typically map it to your “Full Text” field which you can then use as a source in your Extract rules.

AB/7R
BV
1540030393597
1540030393597
INVOICE
CONTOSO LTD.
Contoso Headquarters→INVOICE: INV-100
123 456th St→INVOICE DATE: 11/15/2019
New York, NY, 10001→DUE DATE: 12/15/2019
CUSTOMER NAME: MICROSOFT CORPORATION
Signed off on 18/06/18
(…)
Document Content Printed string Only returns printed text from your document. If there is handwritten text present, it will filter this out of the result. 1540030393597
INVOICE
CONTOSO LTD.
Contoso Headquarters→INVOICE: INV-100
123 456th St→INVOICE DATE: 11/15/2019
New York, NY, 10001→DUE DATE: 12/15/2019
CUSTOMER NAME: MICROSOFT CORPORATION
(…)
Document Content Handwritten string Only returns handwritten text from your document. If there is printed text present, it will filter this out of the result. AB/7R
BV
Signed off on 18/06/18
Document Content Printed and Handwritten string

The complete extracted text, filtering out any barcodes (if present), without any formatting. This can be useful if you want to extract any other values that the model was not able to find.

You would typically map it to your “Full Text” field which you can then use as a source in your Extract rules.

AB/7R
BV
1540030393597
INVOICE
CONTOSO LTD.
Contoso Headquarters→INVOICE: INV-100
123 456th St→INVOICE DATE: 11/15/2019
New York, NY, 10001→DUE DATE: 12/15/2019
CUSTOMER NAME: MICROSOFT CORPORATION
Signed off on 18/06/18
(…)
Dominant Language string Returns the iso code of the document’s dominant language. en
Barcodes (All / Type) string

Returns all barcode values on the document OR the values of the specified barcode type.

Currently, the engine supports the following barcode types:

– Codabar
– Code 39
– Code 93
– Datamatrix
– EAN13
– EAN8
– QR Code
– Code128
– Interleaved 2 of 5
– UPC-A
– UPC-E
– PDF 417

1540030393597

The Invoice model can return the following form field values:

Single Value Fields (see example invoice on the right):

Name Type Description Standardized Output Format Example Processed Value
Customer Name String Invoiced customer MICROSOFT CORPORATION
Customer Id String Customer reference ID CID-12345
Purchase Order String Purchase order reference number PO-3333
Invoice Id String ID for this specific invoice (often “Invoice Number”) INV-100
Invoice Date Date Date the invoice was issued YYYY-MM-DD 2019-11-15
Due Date Date Date payment for this invoice is due YYYY-MM-DD 2019-12-15
Vendor Name String Vendor name CONTOSO LTD.
Vendor Tax Id String The taxpayer number associated with the vendor
Vendor Address String Vendor mailing address 123 456th St New York, NY, 10001
Vendor Address Recipient String Name associated with the Vendor Address Contoso Headquarters
Customer Address String Mailing address for the Customer 123 Other St, Redmond WA, 98052
Customer Tax Id String The taxpayer number associated with the customer
Customer Address Recipient String Name associated with the Customer Address Microsoft Corp
Billing Address String Explicit billing address for the customer 123 Bill St, Redmond WA, 98052
Billing Address Recipient String Name associated with the BillingAddress Microsoft Finance
Shipping Address String Explicit shipping address for the customer 123 Ship St, Redmond WA, 98052
Shipping Address Recipient String Name associated with the ShippingAddress Microsoft Delivery
Payment Term String The terms of payment for the invoice 30 NET
Sub​total Number Subtotal field identified on this invoice Integer 100.0
Subtotal Currency Code String The currency code associated with the extracted subtotal amount USD
Subtotal Currency Symbol String The currency symbol associated with the extracted subtotal amount $
Total Tax Number Total tax field identified on this invoice Integer 10.0
Total Tax Currency Code String The currency code associated with the extracted invoice total amount USD
Total Tax Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Invoice Total Number (USD) Total new charges associated with this invoice Integer 110.0
Invoice Total Currency Code String The currency code associated with the extracted invoice total amount USD
Invoice Total Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Amount Due Number (USD) Total Amount Due to the vendor Integer 610.0
Amount Due Currency Code String The currency code associated with the extracted invoice total amount USD
Amount Due Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Service Address String Explicit service address or property address for the customer 123 Service St, Redmond WA, 98052
Service Address Recipient String Name associated with the Service Address Microsoft Services
Remittance Address String Explicit remittance or payment address for the customer 123 Remit St New York, NY, 10001
Remittance Address Recipient String Name associated with the Remittance Address Contoso Billing
Service Start Date Date First date for the service period (for example, a utility bill service period) YYYY-MM-DD 2019-10-14
Service End Date Date End date for the service period (for example, a utility bill service period) YYYY-MM-DD 2019-11-14
Previous Unpaid Balance Number Explicit previously unpaid balance Integer 500.0
Previous Unpaid Balance Currency Code String The currency code associated with the extracted invoice total amount USD
Previous Unpaid Balance Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Payment Details IBAN String Holds the IBAN Payment Option details
Payment Details SWIFT String Holds the SWIFT Payment Option details
Total Discount Number The total discount applied to an invoice Integer
Total Discount Currency Code String The currency code associated with the extracted invoice total amount USD
Total Discount Balance Currency Symbol String The currency symbol associated with the extracted invoice total amount $

Line items (see example invoice on the right):

 

Name Type Description Example Text Example Processed Value
Line Item Amount Number The amount of the line item $60.00
$30.00
$10.00
60.0
30.0
10.0
Line Item Currency Code String The currency code associated with the extracted line item amount   USD
USD
USD
Line Item Currency Symbol String The currency symbol associated with the extracted line item amount   $
$
$
Line Item Description String The text description for the invoice line item Consulting Services
Document Fee
Printing Fee
Consulting Services
Document Fee
Printing Fee
Line Item Quantity Number The quantity for this invoice line item 2
3
10
2
3
10
Line Item Unit String The unit of the line item, e.g, kg, lb etc.

hours

pages

hours

pages

Line Item Unit Price Number The net or gross price (depending on the gross invoice setting of the invoice) of one unit of this item $30.00
$10.00
$1.00
30.0
10.0
1.0
Line Item Unit Price Currency Code String The currency code associated with the extracted line item unit price   USD
USD
USD
Line Item Unit Price Currency Symbol String The currency symbol associated with the extracted line item unit price   $
$
$
Line Item Product Code String Product code, product number, or SKU associated with the specific line item A123
B456
C789
A123
B456
C789
Line Item Date Date Date corresponding to each line item. Often it’s a date the line item was shipped 3/4/2021
3/5/2021
2/6/2021
2021-04-03
2021-05-03
2021-06-03
Line Item Tax Number Tax associated with each line item. Possible values include tax amount and tax Y/N    
Line Item Tax Rate Number Tax Rate associated with each line item. 10%
5%
20%
10%
5%
20%

The Receipt model can return the following form field values:

Thermal receipts (General, Meal, Credit Card, Gas, Parking):

Field Type Description Example Value
Example Processed Value
Merchant Name string Name of the merchant issuing the receipt Contoso Contoso
Merchant Phone Number phoneNumber Listed phone number of merchant 987-654-3210 987-654-3210
Merchant Address address Listed address of merchant 123 Main St. Redmond WA 98052 123 Main St. Redmond WA 98052
Total number Full transaction total of receipt $14.34 14.34
Transaction Date date Date the receipt was issued June 06, 2019 2019-06-06
Transaction Time time Time the receipt was issued 4:49 PM 16:49:00
Subtotal number Subtotal of receipt, often before taxes are applied $12.34 12.34
Total Tax number Tax on receipt, often sales tax or equivalent $2.00 2.0
Tip number Tip included by buyer $1.00 1.0
Line Item Total Price number Total price of line item 7.20 €
7.80 €
26.50 €
23.90 €
7.2
7.8
26.5
23.9
Line Item Description string Item description Surface Pro 6
Wireless Mouse Model 2
Surface Pro 6
Wireless Mouse Model 2
Line Item Quantity number Quantity of each item 1
2
1
1
1
2
1
1
Line Item Price number Individual price of each item unit $1.00
$0.56
$3.99
1.0
0.56
3.99

Hotel receipts:

Field Type Description Example Value
Example Processed Value
Merchant Name string Name of the merchant issuing the receipt Contoso Contoso
Merchant Phone Number phoneNumber Listed phone number of merchant 987-654-3210 987-654-3210
Merchant Address address Listed address of merchant 123 Main St. Redmond WA 98052 123 Main St. Redmond WA 98052
Total number Full transaction total of receipt $14.34 14.34
Arrival Date date Date of arrival 27Mar21 2021-03-21
Departure Date date Date of departure 28Mar21 2021-03-28
Currency string Currency unit of receipt amounts (ISO 4217), or ‘MIXED’ if multiple values are found USD
EUR
MIXED
Merchant Aliases string Alternative name of merchant Contoso (R) Contoso
Line Item Total Price number Total price of line item 7.20 €
7.80 €
26.50 €
23.90 €
7.2
7.8
26.5
23.9
Line Item Date date Item date 27Mar21 2021-03-27
Line Item Description string Item description Room Charge
BBQ Hamburger
Salted Almonds
Room Charge
BBQ Hamburger
Salted Almonds
Line Item Category string Item category Room
Room Service
Mini Bar
Room
Room Service
Mini Bar

The form field values for the Other Form model vary depending on the form. Please refer to the setup instructions for more details.

The ID Document model can return the following form field values:

Name Type Description Example Processed Value
Document Type string The type of ID document.
  • driverLicense
  • passport
  • nationalIdentityCard
  • residencePermit
  • usSocialSecurityCard

National Identity Card:

Field Type Description Example Value
Example Processed Value
Country Region Country Region Country or region code USA USA
Region string State or province Washington Washington
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
Document Discriminator string National identity card document discriminator 12645646464554646456464544 12645646464554646456464544
First Name string Given name and middle initial, if applicable LIAM R. LIAM R.
Last Name string Surname TALBOT TALBOT
Address address Address 123 STREET ADDRESS YOUR CITY WA 99999-1234 123 STREET ADDRESS YOUR CITY WA 99999-1234
Date of Birth date Date of birth 01/06/1958 01/06/1958
Date of Expiration date Date of expiration 08/12/2020 2020-12-08
Date of Issue date Date of issue 08/12/2012 2012-12-08
Eye Color string Eye color BLU BLU
Hair Color string Hair color BRO BRO
Height string Height 5’11” 5’11”
Weight string Weight 185LB 185LB
Sex string Sex M M

Passport:

Field Type Description Example Value
Example Processed Value
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
First Name string Given name and middle initial, if applicable JENNIFER JENNIFER
Middle Name string Name between given name and surname REYES REYES
Last Name string Surname BROOKS BROOKS
Aliases string Also known as MAY LIN MAY LIN
Date of Birth date Date of birth 1980-01-01 1980-01-01
Date of Expiration date Date of expiration 2019-05-05 2019-05-05
Date of Issue date Date of issue 2014-05-06 2014-05-06
Sex string Sex M M
Country Region country region Issueing country or organization USA USA
Nationality county region Nationality USA USA
Place of Birth string Place of birth MASSACHUSETTS, U.S.A. MASSACHUSETTS, U.S.A.
Place of Issue string Place of issue LA PAZ LA PAZ
Issueing Authority string Issueing authority United States Department of State United States Department of State
Personal Number string Personal ID Number A234567893 A234567893
Machine Readable Zone string The complete value of the machine readable zone at the bottom of a passport. It holds all the passport’s ID information.

P<USABROOKS<<JENNIFER
<<<<<<<<<<<<<<<<<<<<<<
<3400200135USA8001014
F1905054710000307<715816

P<USABROOKS<<JENNIFER
<<<<<<<<<<<<<<<<<<<<<<
<3400200135USA8001014
F1905054710000307<715816
Machine Readable Zone Country Region string Country region derived from the Machine Readable Zone USA USA
Machine Readable Zone Date of Birth date Date of birth derived from the Machine Readable Zone 800101 1980-01-01
Machine Readable Zone Date of Expiration date Date of expiration derived from the Machine Readable Zone 190505 2019-05-05
Machine Readable Zone Document Number string National identity card number derived from the Machine Readable Zone 340020013 340020013
Machine Readable Zone First Name string First name derived from the Machine Readable Zone JENNIFER JENNIFER
Machine Readable Zone Last Name string Surname derived from the Machine Readable Zone BROOKS BROOKS
Machine Readable Zone Nationality country region Nationality derived from the Machine Readable Zone USA USA
Machine Readable Zone Sex string Sex derived from the Machine Readable Zone F F

Residence Permit:

Field Type Description Example Value
Example Processed Value
Country Region Country Region Country or region code USA USA
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
First Name string Given name and middle initial, if applicable LIAM R. LIAM R.
Last Name string Surname TALBOT TALBOT
Date of Birth date Date of birth 01/06/1958 1958-06-01
Date of Expiration date Date of expiration 08/12/2020 2020-12-08
Date of Issue date Date of issue 08/12/2012 2012-12-08
Sex string Sex M M
Place of Birth string Place of birth Germany Germany
Category string Permit category DV2 DV2
Address address Address 123 STREET ADDRESS YOUR CITY WA 99999-1234 123 STREET ADDRESS YOUR CITY WA 99999-1234

US Social Security Card:

Field Type Description Example Value
Example Processed Value
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
First Name string Given name and middle initial, if applicable LIAM R. LIAM R.
Last Name string Surname TALBOT TALBOT
Date of Issue date Date of issue 08/12/2012 2012-12-08

01 – Test button / Auto Test Mode: press the Test button to show the extracted values of your invoice in the Result window and get additional information about the result such as if OCR was applied or not, the confidence level of each field in the “Confidence” column, and the confidence level of the Azure Forms Recognizer’s “Document Content” field value, above the result table.

Auto test:  press the drop-down arrow next to the Test button to enable Auto Testing. With this you can automatically test each document as you browse through them using the blue document navigation buttons.

02 – OCR (Yes/No): there are many types of PDFs. The most common PDF type used with MetaServer are Text-Based PDFs and Image-based PDFs.

Electronic / Text-based PDFs are generated by a computer program like MS Word, Invoice / Report creation software, etc.  Text-based PDFs already contain computer text represented by fonts. This text can directly be extracted without any OCR processing.

Scanned / Image-based PDFs contain an image of each of the pages of the document and require OCR (Optical Character Recognition) to convert the images to computer text.

The Azure Form Recognizer automatically switches between electronic text extraction, in case of text-based PDFs, and OCR extraction, in case of a scanned image.

This way, your Microsoft Azure Form Recognizer resource returns a result with 100% confidence with electronic text.

If OCR is applied, the OCR value will indicate Yes.

If the PDF is 100% electronic, then the OCR value will indicate No.

If the PDF is partially electronic but also contains some image information (like logos), then the OCR value will indicate Mixed.

The below example shows a 100% electronic document, as indicated with “No” OCR.

03 – Confidence: this signifies the confidence level of the Azure Form Recognizer’s “Document Content” field value, which is currently mapped to the “Full Text” value in our example.

NOTE: The confidence level of each individual field can be found in the “Confidence” column of your test results.

TIP: you can copy the current settings and paste them in another setup window of the same type. Do this by pressing the Settings button in the bottom left of the Setup window and by selecting Copy. Then open another setup window of the same type and select Paste.

You are also able to run the Azure Form Recognizer engine on-premise using Containers through the Docker engine.

Running the engine on-premise can be useful for security and data governance requirements.

You can find a detailed guide discussing the prerequisites and how to set up your AFR container here.

Subscribe to our Newsletter


Please check the box below to agree to the privacy policy and continue *


NOTE: if you're experiencing trouble with submitting this form, please try again using another browser.