MetaServer > Help > Extract > Extract Text (Azure AI Document Intelligence)

120-150 Extract – Extract Text (Azure AI Document Intelligence)

With MetaServer’s Extract Text (Azure AI Document Intelligence) rule (FKA “Azure Form Recognizer”), you can automatically extract header data, key values and line items from forms, like invoices, and store that extracted data in fields. This is done through Azure AI Document Intelligence’s prebuilt models. These do not require any training or configuration.

It reads machine printed text, cursive handwriting, barcodes and CMC7 text (except for the Other Form model). It’s exceptional in handling inferior quality images like those photographed with a smart phone and smudged or damaged documents.

You can specify the pages where you want to extract information from. After the values have been extracted, you can apply other Extract rules to clean up, format or adjust the values before sending it to the next action.

You also have the option to build your own custom models using the Azure Document Intelligence Studio, like:

– A Custom Classification model, to automatically classify and separate documents in a single step.

– A Custom Extraction model, to automatically extract fields from any type of document that is not available as a prebuilt model (e.g. shipping documents, litigation documents, HR documents, contracts, etc.) in a single step.

You can refer to the Azure AI Document Intelligence’s documentation for a more detailed list of the supported languages for each model.

Some examples (results are below each sample):

US invoice (EN)

German invoice

Spanish invoice

French invoice

Belgian invoice (NL)

UK invoice (EN)

US invoice (EN)

French invoice

US invoice (EN)

US invoice (EN)

NOTE: To validate line items as a table with different columns, you need to merge and format all line items in 1 CSV field using a Format CSV rule. The CSV field would then contain all line items and / or header data in a CSV format.

For example:

5138489C”,”TRAXION MENACE CREW”,”9.00″,”EACH”,”10″,””,”90.00″
5138489D”,”TRAXION MENACE CREW “,”9.00″,” EACH “,”2″,””,”18.00″
5144035″,”STADIUM II BACKPACK”,”30.00″,” EACH “,”1″,”3″,”90.00″
5142723″,”STRIKER II TEAM BACKPACK”,”22.50″,” EACH “,”1″,””,”22.50″

Using a Validate CSV rule, it would look like this in validation:

NOTE: You need to sign up for the Azure AI Document Intelligence service. Paid plans for the prebuilt models are available for 10$ per 1000 pages (S0 Plan for Prebuilt document types).

If you only use the Read model, paid plans are available for $ 1.50 per 1000 pages.

There is also a free, 1-year plan (F0) where you can test the engine with prebuilt models up to 500 pages per month for free.

IMPORTANT: The processing speed in the free plan is limited to only 1 call per 2 seconds and only reads 2 pages of the invoice. For the paid plan (S0 plan), the processing speed is 15 calls per second, which is 30 times faster than the free plan and reads all the pages of the invoice.

If you want to use your own Custom Classification model, the plan starts from $ 3 per 1000 pages. Building the model is $ 3 per hour. When you’ve prepared good samples, you will, at most, need to train your model a few times. The building time will only take minutes, not hours.

More detailed information about the pricing can be found here:
https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/

For more information on how to apply for a key, please refer to the instructions below.

NOTE: For more technical information about how the Microsoft’s Azure AI Document Intelligence engine works (API, OCR, etc.) and how they handle Data privacy and security, please refer to the Microsoft Azure AI Document Intelligence documentation.

Extract Text rules are defined in a MetaServer Extract or Separate Document / Process Page action.

To add this rule, press the Add button and select Extract -> Text (Azure AI Document Intelligence).

TIP: The thumbnail on the right will follow you, so you can easily refer to the Setup window. Click on the thumbnail to zoom in.

If you haven’t done so already, you need to first add your “Azure AI Document Intelligence” resource in the Azure Resources setup, which can be found in the Admin Client’s Server tab.

You can find more detailed information regarding pricing + instructions on how to apply for an “Azure AI Document Intelligence” resource here:
https://www.capturebites.com/metaserver/help/server/#10-01

First, add a description to your rule. Then, press the Setup button in the upper toolbar to setup the connection to your Azure AI Document Intelligence resource.

01 – Resource and Endpoint: select your Azure AI Document Intelligence resource using the drop-down arrow. If you don’t see your resource listed, please make sure your resource was added in the Azure Resources setup, which can be found in the Admin Client’s Server tab.

Your endpoint will be automatically populated when you have selected your resource.

In the screenshot above, the resource is called “metaserver-ai-di”.

The “Invoice” model automates processing of invoices to extract header data like vendor name, vendor tax id, invoice number, invoice date, due date, payment terms, total amount, etc.

The model also extract line items like article codes, unit price, quantity, etc..

This model reads machine printed text, cursive handwriting, barcodes and CMC7 text.

You can find more details regarding pricing in the pricing explanation.

More detailed information about this model can be found here:
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-invoice

The “Other Form” model automatically detects key value pairs for fields, tables and check boxes on any form type. You set one sample of your forms as a Master Form to detect the data elements on the form which can then be mapped with MetaServer fields. This reduces the time to configure the extraction of a form considerably.

The “Other Form” model reads machine printed text, cursive handwriting and barcodes. Unlike the other models, it currently does not read CMC7 text.

You can find more details regarding pricing in the pricing explanation.

Setup

Step 1) After you have selected the “Other Form” prebuilt model and finished setting your extract settings, press OK to go back to the mapping setup screen.

You now need to select a sample of your form as a Master Form. After you have added a Master Form, your sample form will be analyzed and all data elements are detected and exposed as field labels and table headers.

Step 2) You then map the detected field labels and table headers that you are interested in with MetaServer fields.

NOTE: Table headers are pre-fixed with “Line Item” to distinguish them from regular, single value fields.

With your new Master Form, you can now test other forms of the same type to check if all the data is detected.

Step 3) If you come across any variations of the same form where different labels are used for the same field, you need to define alternate labels for this field or column.

For example, on the standard sample forms, a column label “NAME” was detected. But on another version of the form, the label is called “NAME (Please Print)”.  You can define “NAME (Please Print)” as an alternate label for NAME.

First, press the “Form Fields” button.

This panel shows all your test form’s labels and values on the left side. The right side shows the Master Form’s labels and values. Select the alternate column label, in this case “Line Item NAME (Please Print)”, on the left side and copy it.

Step 4) To add the alternate field name, press the setup button (…) next to the Form Field name, in this case “Line Item NAME”. Please note that the alternate labels are evaluated in sequence of appearance.

Paste the alternate field name in the list and press OK.

When you test the form now, the alternate label “NAME (Please Print)” is also used to detect the “NAME” column and it is successfully extracted.

The “Receipt” model automatically extracts merchant name, dates, line items, quantities, and totals from printed and handwritten receipts. The version v3.0 also supports single-page hotel receipt processing. The “Receipt” prebuilt model is more limited then the “Invoice” model. In our tests, for European tax receipts we see better results with the invoice model.

This model reads machine printed text, cursive handwriting, barcodes and CMC7 text.

You can find more details regarding pricing in the pricing explanation.

The preview model is documented here:

Receipt model:
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-receipt

The “ID Document” model automatically extracts information from ID cards, passports, driver licenses, residence permits and US social security cards. It can also automatically classify the ID document, which is shown in the “Document Type” field.

You can find more details regarding pricing in the pricing explanation.

More detailed information about this model can be found here:
https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-id-document

The “Check” model automatically extracts information from (US) Checks.

It can also be used with non-US checks but, with non-US checks, the MICR line is not detected and needs to be found in the full text using conventional MetaServer rules.

You can find more details regarding pricing in the pricing explanation.

More detailed information about this model can be found here:
https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/bank-check

The “Read” model automatically extracts the full text. This model limits itself to only extracting the full text, but it often returns better results than the Extract Text (Azure AI Vision) rule’s engine.

Since it does not include any special extraction logic like the prebuilt models, it is also a cheaper pricing tier. You can find more details regarding pricing in the pricing explanation.

This model reads machine printed text, cursive handwriting, barcodes and CMC7 text.

The preview model is documented here:

Read model:
https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-receipt

With a “Custom Classification” model, you can detect your document type in a single step. You can then also use the result to automatically separate document sets based on the document type.

You build your classification model using the Azure Document Intelligence Studio (DI Studio).

You can find more details regarding pricing in the pricing explanation.

Creating a new Custom Classification Model

Step 1) In the DI Studio, create a new “Custom classification model”.

Step 2) Create a new project and follow the wizard’s steps.

Step 3) Upload a minimum of 5 training samples and label them. This will assign each of them to a document class (= document type).

Step 4) Once you’re finished, you can go ahead and “Train” your model. Depending on the complexity of your model, it should finish building your model in a few minutes.

Step 5) As soon as your model’s finished, you can test your classification model on other samples in the “Test” tab.

In case there are any mistakes, you can go back to your “Label data” tab and adjust your labels or add more samples where necessary. You then retrain your model and test again.

Step 6) When you are satisfied with your final classification model, you can select it from the “Model” list in your Extract Text (Azure AI Document Intelligence) rule.

Any future changes that happen to your custom classification model will be automatically applied.

Step 8) Map the Document Type field with the corresponding MetaServer field.

You can also test other samples within the setup to check if the document type was detected correctly.

If you have documents for which there is no prebuilt model (e.g. shipping documents, litigation documents, HR documents, contracts, etc.), you can now create your own custom extraction model.

You build your extraction model using the Azure Document Intelligence Studio (DI Studio).

You can find more details regarding pricing in the pricing explanation.

Creating a new Custom Extraction Model

Step 1) In the DI Studio, create a new “Custom extraction model”.

Step 2) Create a new project and follow the wizard’s steps.

Step 3) Upload different training samples of your documents and add your fields that you need to extract from your documents. After that, just select and click to link each of your fields with the correct data on the samples.

NOTE: We recommend training your model with at least 5 different samples for a more reliable result, but you can already test using 1 sample.

Step 4) Once you’re finished, you can go ahead and “Train” your model. Depending on the complexity of your model, it should finish building your model in a few minutes.

Step 5) As soon as your model’s finished, you can test your extraction model on other samples in the “Test” tab.

In case there are any mistakes, you can go back to your “Label data” tab and adjust your fields or add more samples where necessary. You then retrain your model and test again.

Step 6) When you are satisfied with your final extraction model, you can select it from the “Model” list in your Extract Text (Azure AI Document Intelligence) rule.

Any future changes that happen to your custom extraction model will be automatically applied.

Step 8) Map your custom extraction model fields with the corresponding MetaServer field.

You can also test other samples within the setup to check if each field is extracted correctly.

08 – Apply: choose when to apply the rule. The default option is “Always”, which means that the rule is always applied. Press the drop-down arrow to see all other available conditions.

A good example of conditional extraction, is if you first try to extract a value using the Extract Text rule (= standard OCR engine) or Extract Text (Azure Computer Vision) rule but it doesn’t return a valid result. Only then will you let the Extract Text (Azure AI Document Intelligence) rule try and extract the value.

This speeds up the extraction process and only uses calls to your Microsoft Azure AI Document Intelligence resource when your first search didn’t return a good result.

After selecting your condition, for example, “If field value is blank”, press the “…” button next to the drop-down arrow to open that condition’s setup window.

1) If value of field: press the drop-down arrow to select the field value that needs to be evaluated.

2) is equal to / is not equal to / is greater than /…: enter the other value your field value needs to be compared with. You can also press the drop-down button to select different system and index values to compose your value.

09 – Page: set the page number to where the information is located. The default is set to all pages (= blank).

For example:
– Enter 1 for the 1st page
– Enter -1 for the last page
– Enter 1-3 to extract from page 1 to page 3.
– Leave this empty in case you want to extract all pages (same as 1–1)
– Etc.

You can also press the drop-down arrow to use a field value containing a page number value to switch the page number(s) dynamically.

Example use-case: A form contains 10 pages. You are interested in extracting the information on the page about the “Bank Details” with a big heading “SECTION 5: BANK DETAILS”. The form’s pages are not always in the correct sequence, meaning that the bank details can be on any of the 10 pages. You first use a Find Word with Mask / Words rule to find the words “SECTION 5: BANK DETAILS” and put the found words in a field called, for example, “KEYWORD”. If the keyword is found on page 7, then the variable { Page Number, KEYWORD } would return the value “7”.

You can then use the field “KEYWORD” as your Page in your Extract Text (Azure AI Document Intelligence) rule.

NOTE:  if you use a variable as your Page value, you can specify a Test page, solely for testing purposes.

NOTE: If a document does not contain a specific page, it is ignored. For example, extracting page “2,3” on a 2-page document will only extract page 2.

The free F0 plan in combination with the invoice model, will only process a maximum number of 2 pages for each invoice. With the page range you can determine what those pages are. For example 1,-1 would send the first and last page of each invoice.

01 – Deskew & Rotate: if your documents are skewed or rotated incorrectly, you can enable the Deskew and/or Rotate option to optimize Text extraction. It will also result in a corrected version of the page(s).

02 – Color Dropout: if your documents contains a lot of colored tables, lines or stamps throughout the values you want to extract, you can enable the the color dropout feature, which allows you to select up to 3 dropout colors.

Press the setup button to specify which colors to drop out.

Each selected color has its tolerance. With the Test button, you can see the effect after dropping out the selected colors in the right preview windows.

To reset all dropout colors to white (off), you can use the “Reset All Colors” button.

NOTE: The filtered image is only used temporarily to improve text extraction. The processed image keeps all the original colors.

03 – Confidence: characters with a confidence level lower than the set confidence level, will be ignored and not returned in the result. If set to 0, all characters are accepted.

For legacy reasons, this setting is retained. We recommend to start using the new Check if confidence is lower than option in the validate rules.

To help you in defining the correct confidence level, you can check the confidence level of each field in the “Confidence” column of your test results.

NOTE: The “Confidence” value shown beneath “OCR” refers to the confidence level of the Azure AI Document Intelligence’s “Document Content” field value, which is currently mapped to the “Full Text” value in our example.

04 – Tabs: by default, lines are segmented in multiple word groups that are separated by tabs. If you want no tabs at all, press the drop-down button to select the “Remove” option and all the words will be grouped as 1 single word group for each line.

05 – Convert page(s) to searchable PDF: enable this option if you want to save the extracted text as a searchable text layer in the processed PDF. It will only do this for the page(s) set in the Page(s) field. Leaving the Page(s) field empty will convert all pages of the documents.

As a result, you will be able to search handwritten, arabic, cyrillic or low-quality text in your exported PDF:

This high-quality text layer can also be used during Validation with the Select text tool. To do this, please make sure you also enable the “Use searchable text layer if present” option in the Select text tool setup:

06 – Searchable barcodes: enable this option when you also want to be able to search for barcode values when you convert page(s) to searchable PDF.

01 – Field: The “Field” column shows your MetaServer fields. You can map them to the corresponding AI Document Intelligence field values using the drop-down arrow in the “Form Field” column (see below).

02 – Form Field: based on the selected Azure AI Document Intelligence Model (Invoice, Receipt, Other Form, Read), you can map your MetaServer fields to the model’s extracted field values.

NOTE: As you will see in the output, some field values, like dates and amounts, are automatically reformatted by the Azure AI Document Intelligence engine. This is to output consistent, standardized formats, regardless of the input.

Dates are formatted to YYYY-MM-DD format.

For example:

12/06/2023

becomes:

2023-06-12 (on a European invoice)

2023-12-06 (on a US invoice)

Amounts are formatted without thousand separators, they have a period (.) as a decimal character and the number of digits after decimal will be up to the available number of digits, except for the Total amounts of the header data where it will round it to 2 digits.

For general amounts:

*.00 will become *.0

*.0000 wil become *.0

*.20 will become *.2

*.1254 will remain *.1254

 

For Total amounts of header data:

*.00 will become *.0

*.0000 wil become *.00

*.20 will become *.2

*.1254 will become *.13

 

The currency is also removed from all types of amounts.

For example:

£20,432.625

becomes:

20432.2

 

Time values are formatted to a HH:mm:ss format.

For example:

07:45 PM

becomes:

19:45:00

If the format needs to be different for your output, you can change it for each field by using the Extract action’s Format and Edit rules.

All models (= prebuilt and Read models) can return the following form field values:

 

Name Type Description Example Processed Value
Document Content string

The complete extracted text, including printed and handwritten text, and barcode values (if present), without any formatting. This can be useful if you want to extract any other values that the model was not able to find.

You would typically map it to your “Full Text” field which you can then use as a source in your Extract rules.

AB/7R
BV
1540030393597
1540030393597
INVOICE
CONTOSO LTD.
Contoso Headquarters→INVOICE: INV-100
123 456th St→INVOICE DATE: 11/15/2019
New York, NY, 10001→DUE DATE: 12/15/2019
CUSTOMER NAME: MICROSOFT CORPORATION
Signed off on 18/06/18
(…)
Document Content Printed string Only returns printed text from your document. If there is handwritten text present, it will filter this out of the result. 1540030393597
INVOICE
CONTOSO LTD.
Contoso Headquarters→INVOICE: INV-100
123 456th St→INVOICE DATE: 11/15/2019
New York, NY, 10001→DUE DATE: 12/15/2019
CUSTOMER NAME: MICROSOFT CORPORATION
(…)
Document Content Handwritten string Only returns handwritten text from your document. If there is printed text present, it will filter this out of the result. AB/7R
BV
Signed off on 18/06/18
Document Content Printed and Handwritten string

The complete extracted text, filtering out any barcodes (if present), without any formatting. This can be useful if you want to extract any other values that the model was not able to find.

You would typically map it to your “Full Text” field which you can then use as a source in your Extract rules.

AB/7R
BV
1540030393597
INVOICE
CONTOSO LTD.
Contoso Headquarters→INVOICE: INV-100
123 456th St→INVOICE DATE: 11/15/2019
New York, NY, 10001→DUE DATE: 12/15/2019
CUSTOMER NAME: MICROSOFT CORPORATION
Signed off on 18/06/18
(…)
Dominant Language string Returns the iso code of the document’s dominant language. en
Barcodes (All / Type) string

Returns all barcode values on the document OR the values of the specified barcode type.

Currently, the engine supports the following barcode types:

– Codabar
– Code 39
– Code 93
– Datamatrix
– EAN13
– EAN8
– QR Code
– Code128
– Interleaved 2 of 5
– UPC-A
– UPC-E
– PDF 417

1540030393597

The Invoice model can return the following form field values:

Single Value Fields (see example invoice on the right):

Name Type Description Standardized Output Format Example Processed Value
Customer Name String Invoiced customer MICROSOFT CORPORATION
Customer Id String Customer reference ID CID-12345
Purchase Order String Purchase order reference number PO-3333
Invoice Id String ID for this specific invoice (often “Invoice Number”) INV-100
Invoice Date Date Date the invoice was issued YYYY-MM-DD 2019-11-15
Due Date Date Date payment for this invoice is due YYYY-MM-DD 2019-12-15
Vendor Name String Vendor name CONTOSO LTD.
Vendor Tax Id String The taxpayer number associated with the vendor
Vendor Address String Vendor mailing address 123 456th St New York, NY, 10001
Vendor Address Recipient String Name associated with the Vendor Address Contoso Headquarters
Customer Address String Mailing address for the Customer 123 Other St, Redmond WA, 98052
Customer Tax Id String The taxpayer number associated with the customer
Customer Address Recipient String Name associated with the Customer Address Microsoft Corp
Billing Address String Explicit billing address for the customer 123 Bill St, Redmond WA, 98052
Billing Address Recipient String Name associated with the BillingAddress Microsoft Finance
Shipping Address String Explicit shipping address for the customer 123 Ship St, Redmond WA, 98052
Shipping Address Recipient String Name associated with the ShippingAddress Microsoft Delivery
Payment Term String The terms of payment for the invoice 30 NET
Sub​total Number Subtotal field identified on this invoice Integer 100.0
Subtotal Currency Code String The currency code associated with the extracted subtotal amount USD
Subtotal Currency Symbol String The currency symbol associated with the extracted subtotal amount $
Total Tax Number Total tax field identified on this invoice Integer 10.0
Total Tax Currency Code String The currency code associated with the extracted invoice total amount USD
Total Tax Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Invoice Total Number (USD) Total new charges associated with this invoice Integer 110.0
Invoice Total Currency Code String The currency code associated with the extracted invoice total amount USD
Invoice Total Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Amount Due Number (USD) Total Amount Due to the vendor Integer 610.0
Amount Due Currency Code String The currency code associated with the extracted invoice total amount USD
Amount Due Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Service Address String Explicit service address or property address for the customer 123 Service St, Redmond WA, 98052
Service Address Recipient String Name associated with the Service Address Microsoft Services
Remittance Address String Explicit remittance or payment address for the customer 123 Remit St New York, NY, 10001
Remittance Address Recipient String Name associated with the Remittance Address Contoso Billing
Service Start Date Date First date for the service period (for example, a utility bill service period) YYYY-MM-DD 2019-10-14
Service End Date Date End date for the service period (for example, a utility bill service period) YYYY-MM-DD 2019-11-14
Previous Unpaid Balance Number Explicit previously unpaid balance Integer 500.0
Previous Unpaid Balance Currency Code String The currency code associated with the extracted invoice total amount USD
Previous Unpaid Balance Currency Symbol String The currency symbol associated with the extracted invoice total amount $
Payment Details IBAN String Holds the IBAN Payment Option details
Payment Details SWIFT String Holds the SWIFT Payment Option details
Total Discount Number The total discount applied to an invoice Integer
Total Discount Currency Code String The currency code associated with the extracted invoice total amount USD
Total Discount Balance Currency Symbol String The currency symbol associated with the extracted invoice total amount $

Line items (see example invoice on the right):

 

Name Type Description Example Text Example Processed Value
Line Item Amount Number The amount of the line item $60.00
$30.00
$10.00
60.0
30.0
10.0
Line Item Currency Code String The currency code associated with the extracted line item amount   USD
USD
USD
Line Item Currency Symbol String The currency symbol associated with the extracted line item amount   $
$
$
Line Item Description String The text description for the invoice line item Consulting Services
Document Fee
Printing Fee
Consulting Services
Document Fee
Printing Fee
Line Item Quantity Number The quantity for this invoice line item 2
3
10
2
3
10
Line Item Unit String The unit of the line item, e.g, kg, lb etc.

hours

pages

hours

pages

Line Item Unit Price Number The net or gross price (depending on the gross invoice setting of the invoice) of one unit of this item $30.00
$10.00
$1.00
30.0
10.0
1.0
Line Item Unit Price Currency Code String The currency code associated with the extracted line item unit price   USD
USD
USD
Line Item Unit Price Currency Symbol String The currency symbol associated with the extracted line item unit price   $
$
$
Line Item Product Code String Product code, product number, or SKU associated with the specific line item A123
B456
C789
A123
B456
C789
Line Item Date Date Date corresponding to each line item. Often it’s a date the line item was shipped 3/4/2021
3/5/2021
2/6/2021
2021-04-03
2021-05-03
2021-06-03
Line Item Tax Number Tax associated with each line item. Possible values include tax amount and tax Y/N    
Line Item Tax Rate Number Tax Rate associated with each line item. 10%
5%
20%
10%
5%
20%

The Receipt model can return the following form field values:

Thermal receipts (General, Meal, Credit Card, Gas, Parking):

Field Type Description Example Value
Example Processed Value
Merchant Name string Name of the merchant issuing the receipt Contoso Contoso
Merchant Phone Number phoneNumber Listed phone number of merchant 987-654-3210 987-654-3210
Merchant Address address Listed address of merchant 123 Main St. Redmond WA 98052 123 Main St. Redmond WA 98052
Total number Full transaction total of receipt $14.34 14.34
Transaction Date date Date the receipt was issued June 06, 2019 2019-06-06
Transaction Time time Time the receipt was issued 4:49 PM 16:49:00
Subtotal number Subtotal of receipt, often before taxes are applied $12.34 12.34
Total Tax number Tax on receipt, often sales tax or equivalent $2.00 2.0
Tip number Tip included by buyer $1.00 1.0
Line Item Total Price number Total price of line item 7.20 €
7.80 €
26.50 €
23.90 €
7.2
7.8
26.5
23.9
Line Item Description string Item description Surface Pro 6
Wireless Mouse Model 2
Surface Pro 6
Wireless Mouse Model 2
Line Item Quantity number Quantity of each item 1
2
1
1
1
2
1
1
Line Item Price number Individual price of each item unit $1.00
$0.56
$3.99
1.0
0.56
3.99

Hotel receipts:

Field Type Description Example Value
Example Processed Value
Merchant Name string Name of the merchant issuing the receipt Contoso Contoso
Merchant Phone Number phoneNumber Listed phone number of merchant 987-654-3210 987-654-3210
Merchant Address address Listed address of merchant 123 Main St. Redmond WA 98052 123 Main St. Redmond WA 98052
Total number Full transaction total of receipt $14.34 14.34
Arrival Date date Date of arrival 27Mar21 2021-03-21
Departure Date date Date of departure 28Mar21 2021-03-28
Currency string Currency unit of receipt amounts (ISO 4217), or ‘MIXED’ if multiple values are found USD
EUR
MIXED
Merchant Aliases string Alternative name of merchant Contoso (R) Contoso
Line Item Total Price number Total price of line item 7.20 €
7.80 €
26.50 €
23.90 €
7.2
7.8
26.5
23.9
Line Item Date date Item date 27Mar21 2021-03-27
Line Item Description string Item description Room Charge
BBQ Hamburger
Salted Almonds
Room Charge
BBQ Hamburger
Salted Almonds
Line Item Category string Item category Room
Room Service
Mini Bar
Room
Room Service
Mini Bar

The form field values for the Other Form model vary depending on the form. Please refer to the setup instructions for more details.

The ID Document model can return the following form field values:

Name Type Description Example Processed Value
Document Type string The type of ID document.
  • driverLicense
  • passport
  • nationalIdentityCard
  • residencePermit
  • usSocialSecurityCard

National Identity Card:

Field Type Description Example Value
Example Processed Value
Country Region Country Region Country or region code USA USA
Region string State or province Washington Washington
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
Document Discriminator string National identity card document discriminator 12645646464554646456464544 12645646464554646456464544
First Name string Given name and middle initial, if applicable LIAM R. LIAM R.
Last Name string Surname TALBOT TALBOT
Address address Address 123 STREET ADDRESS YOUR CITY WA 99999-1234 123 STREET ADDRESS YOUR CITY WA 99999-1234
Date of Birth date Date of birth 01/06/1958 01/06/1958
Date of Expiration date Date of expiration 08/12/2020 2020-12-08
Date of Issue date Date of issue 08/12/2012 2012-12-08
Eye Color string Eye color BLU BLU
Hair Color string Hair color BRO BRO
Height string Height 5’11” 5’11”
Weight string Weight 185LB 185LB
Sex string Sex M M

Passport:

Field Type Description Example Value
Example Processed Value
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
First Name string Given name and middle initial, if applicable JENNIFER JENNIFER
Middle Name string Name between given name and surname REYES REYES
Last Name string Surname BROOKS BROOKS
Aliases string Also known as MAY LIN MAY LIN
Date of Birth date Date of birth 1980-01-01 1980-01-01
Date of Expiration date Date of expiration 2019-05-05 2019-05-05
Date of Issue date Date of issue 2014-05-06 2014-05-06
Sex string Sex M M
Country Region country region Issueing country or organization USA USA
Nationality county region Nationality USA USA
Place of Birth string Place of birth MASSACHUSETTS, U.S.A. MASSACHUSETTS, U.S.A.
Place of Issue string Place of issue LA PAZ LA PAZ
Issueing Authority string Issueing authority United States Department of State United States Department of State
Personal Number string Personal ID Number A234567893 A234567893
Machine Readable Zone string The complete value of the machine readable zone at the bottom of a passport. It holds all the passport’s ID information.

P<USABROOKS<<JENNIFER
<<<<<<<<<<<<<<<<<<<<<<
<3400200135USA8001014
F1905054710000307<715816

P<USABROOKS<<JENNIFER
<<<<<<<<<<<<<<<<<<<<<<
<3400200135USA8001014
F1905054710000307<715816
Machine Readable Zone Country Region string Country region derived from the Machine Readable Zone USA USA
Machine Readable Zone Date of Birth date Date of birth derived from the Machine Readable Zone 800101 1980-01-01
Machine Readable Zone Date of Expiration date Date of expiration derived from the Machine Readable Zone 190505 2019-05-05
Machine Readable Zone Document Number string National identity card number derived from the Machine Readable Zone 340020013 340020013
Machine Readable Zone First Name string First name derived from the Machine Readable Zone JENNIFER JENNIFER
Machine Readable Zone Last Name string Surname derived from the Machine Readable Zone BROOKS BROOKS
Machine Readable Zone Nationality country region Nationality derived from the Machine Readable Zone USA USA
Machine Readable Zone Sex string Sex derived from the Machine Readable Zone F F

Residence Permit:

Field Type Description Example Value
Example Processed Value
Country Region Country Region Country or region code USA USA
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
First Name string Given name and middle initial, if applicable LIAM R. LIAM R.
Last Name string Surname TALBOT TALBOT
Date of Birth date Date of birth 01/06/1958 1958-06-01
Date of Expiration date Date of expiration 08/12/2020 2020-12-08
Date of Issue date Date of issue 08/12/2012 2012-12-08
Sex string Sex M M
Place of Birth string Place of birth Germany Germany
Category string Permit category DV2 DV2
Address address Address 123 STREET ADDRESS YOUR CITY WA 99999-1234 123 STREET ADDRESS YOUR CITY WA 99999-1234

US Social Security Card:

Field Type Description Example Value
Example Processed Value
Document Number string National identity card number WDLABCD456DG WDLABCD456DG
First Name string Given name and middle initial, if applicable LIAM R. LIAM R.
Last Name string Surname TALBOT TALBOT
Date of Issue date Date of issue 08/12/2012 2012-12-08

The Check model can return the following form field values:

Field Type Description Example Value
Example Processed Value
Payer Name string Name of the payer (drawer) Jane Doe Jane Doe
Payer Address address Address of the payer (drawer) 123 Main St. Redmond WA 98052 123 Main St. Redmond WA 98052
Pay To string Name of the payee John Smith John Smith
Check Date date Date when the check was written 04-01-2023 2023-04-01
Number Amount number Amount of the check in numeric form 150.00 150.00
Word Amount number Amount of the check in fully written form one hundred fifty and 00/100 one hundred fifty and 00/100
Bank Name string Name of the bank Contoso Bank Contoso Bank
Memo string Short note describing the payment April Rent Payment April Rent Payment
MICR object Full MICR code ⑈0740⑈ ⑆123456789⑆ 1001001234⑈ ⑈0740⑈ ⑆123456789⑆ 1001001234⑈
MICR Routing Number string Routing number portion of the MICR code ⑈0740⑈ ⑆123456789⑆ 1001001234⑈ ⑆123456789⑆
MICR Account Number string Account number portion of the MICR code ⑈0740⑈ ⑆123456789⑆ 1001001234⑈ 1001001234⑈
MICR Check Number string Check number portion of the MICR code ⑈0740⑈ ⑆123456789⑆ 1001001234⑈ ⑈0740⑈
Payer Signatures array Number of payer signatures [ 2 signatures on the check ] signed
signed

 

A Custom Classification model can return the following form field values:

Name Type Description
Document Type string The type of document. The value depends on how you have trained your classification model. Please refer to the setup instructions for more details.

01 – Test button / Auto Test Mode: press the Test button to show the extracted values of your invoice in the Result window and get additional information about the result such as if OCR was applied or not, the confidence level of each field in the “Confidence” column, and the confidence level of the Azure Forms Recognizer’s “Document Content” field value, above the result table.

Auto test:  press the drop-down arrow next to the Test button to enable Auto Testing. With this you can automatically test each document as you browse through them using the blue document navigation buttons.

02 – OCR (Yes/No): there are many types of PDFs. The most common PDF type used with MetaServer are Text-Based PDFs and Image-based PDFs.

Electronic / Text-based PDFs are generated by a computer program like MS Word, Invoice / Report creation software, etc.  Text-based PDFs already contain computer text represented by fonts. This text can directly be extracted without any OCR processing.

Scanned / Image-based PDFs contain an image of each of the pages of the document and require OCR (Optical Character Recognition) to convert the images to computer text.

The Azure AI Document Intelligence automatically switches between electronic text extraction, in case of text-based PDFs, and OCR extraction, in case of a scanned image.

This way, your Microsoft Azure AI Document Intelligence resource returns a result with 100% confidence with electronic text.

If OCR is applied, the OCR value will indicate Yes.

If the PDF is 100% electronic, then the OCR value will indicate No.

If the PDF is partially electronic but also contains some image information (like logos), then the OCR value will indicate Mixed.

The below example shows a 100% electronic document, as indicated with “No” OCR.

03 – Confidence: this signifies the confidence level of the Azure AI Document Intelligence’s “Document Content” field value, which is currently mapped to the “Full Text” value in our example.

NOTE: The confidence level of each individual field can be found in the “Confidence” column of your test results.

TIP: you can copy the current settings and paste them in another setup window of the same type. Do this by pressing the Settings button in the bottom left of the Setup window and by selecting Copy. Then open another setup window of the same type and select Paste.

You are also able to run the Azure AI Document Intelligence engine on-premise using Containers through the Docker engine.

Running the engine on-premise can be useful for security and data governance requirements.

You can find a detailed guide discussing the prerequisites and how to set up your AZDI container here.

Subscribe to our Newsletter


Please check the box below to agree to the privacy policy and continue *


NOTE: if you're experiencing trouble with submitting this form, please try again using another browser.