5.3 Azure AI Document Intelligence

Key Takeaways

Azure AI Document Intelligence (formerly Form Recognizer) extracts text, key-value pairs, tables, selection marks, and typed fields; the v4.0 / 2024-11-30 API and Document Intelligence Studio are the current tooling.
Read and Layout require no training; Layout adds tables, selection marks, and structure, while prebuilt models (invoice, receipt, ID, W-2, 1099, business card, health-insurance, contract) return typed fields with per-field confidence.
Custom models come in two build modes: template (consistent fixed layouts) and neural (varying layouts); both need a minimum of FIVE labeled training documents in a Blob container reached via SAS.
In v4.0 you build a custom classifier and combine it with extraction models so a single call classifies a document then routes it to the right model (the modern replacement for v3.0 composed-model auto-routing).
Every returned field carries a confidence score; production pipelines branch on a confidence threshold and route low-confidence extractions to human review.

Last updated: June 2026

Quick Answer: Document Intelligence (formerly Form Recognizer) extracts structured data from documents. Use Read for text, Layout for tables and structure, prebuilt models for common documents (invoices, receipts, IDs), and custom models for your own forms. Custom models are template (fixed layout) or neural (varying layout) and need at least five labeled samples. Combine a custom classifier with extraction models to auto-route mixed document types. Every field returns a confidence score.

Current Edition and Tooling

The service uses the v4.0 (2024-11-30 GA) REST API and SDKs, edited in Document Intelligence Studio (a no-code labeling portal). The Python client is DocumentIntelligenceClient, and analysis is asynchronous: begin_analyze_document returns a poller, and poller.result() blocks for the result. The old name Form Recognizer still appears in legacy docs and may show up as a distractor on the exam.

Choosing the Right Model

Model	Training?	Returns
prebuilt-read	No	Text, language, handwriting detection
prebuilt-layout	No	Text + tables + selection marks + structure
prebuilt-invoice	No	Vendor, customer, totals, tax, line items
prebuilt-receipt	No	Merchant, date, items, subtotal, tax, tip, total
prebuilt-idDocument	No	Name, DOB, address, doc number, expiry
prebuilt-tax.us.w2 / 1099 / 1098	No	Typed US tax-form fields
prebuilt-businessCard	No	Name, title, company, phone, email
prebuilt-healthInsuranceCard.us	No	Insurer, member ID, group, plan
Custom (template/neural)	Yes	Your labeled fields

Decision shortcuts the exam rewards: only need raw text -> Read; need tables and checkboxes from any document -> Layout; the document is a standard invoice/receipt/ID -> the matching prebuilt; it is your company's unique form -> custom.

Read vs Layout vs General

poller = client.begin_analyze_document(model_id="prebuilt-layout", body=f)
result = poller.result()
for table in result.tables:
    print(table.row_count, "x", table.column_count)
for page in result.pages:
    for mark in page.selection_marks:
        print(mark.state)   # "selected" or "unselected"

Layout is the only no-training model that returns tables and selection marks (checkboxes), so any scenario mentioning checkbox state or table cells points to Layout, not Read.

Prebuilt Models with Confidence

Prebuilt models return strongly-typed fields, each with a confidence score between 0 and 1. Production code branches on a threshold and sends low-confidence results to human review.

poller = client.begin_analyze_document(model_id="prebuilt-invoice", body=f)
for doc in poller.result().documents:
    total = doc.fields.get("InvoiceTotal")
    if total and total.confidence >= 0.80:
        post_to_erp(total.content)
    else:
        route_to_human_review(doc)        # low confidence

Trap: the confidence value is a per-field reliability measure, not an accuracy guarantee. Exam scenarios that say "automatically post invoices over a reliability bar, otherwise queue for a person" are testing whether you branch on field.confidence against a threshold.

Custom Models: Template vs Neural

Both build modes need a minimum of five labeled training documents in a Blob container referenced by a Shared Access Signature (SAS) URL.

Build mode	Layout assumption	Best for	Notes
template	Consistent, fixed positions	Standardized forms, applications	Faster, cheaper, single language per model
neural	Varying layouts	Contracts, letters, multi-vendor forms	Deep learning, better generalization, longer training

poller = client.begin_build_document_model(
    build_mode="neural",   # or "template"
    blob_container_url="https://acct.blob.core.windows.net/training?<SAS>",
    model_id="po-neural-v1")
model = poller.result()

If documents arrive from many vendors with shifting layouts, the answer is neural; if they are one rigid in-house form, template is cheaper and faster.

Classifiers and Routing Mixed Documents

When a single pipeline receives mixed document types, you build a custom classifier (begin_build_classifier) and then combine the classifier with extraction models so one begin_classify_document (or a classifier-driven analyze) classifies first, then routes each document to the correct extraction model. This v4.0 classifier approach is the modern successor to v3.0 composed models, which combined multiple custom extraction models behind one model ID and auto-selected among them.

poller = client.begin_classify_document(
    classifier_id="mailroom-classifier", body=document_bytes)
for doc in poller.result().documents:
    print(doc.doc_type, doc.confidence)   # e.g. "invoice" 0.97

Need	Use
Sort/triage documents into categories only	Custom classifier
Extract typed fields from one known type	Prebuilt or custom extraction model
One endpoint that sorts THEN extracts mixed types	Classifier + extraction models (formerly composed model)

Input Constraints and Security

Know the practical limits the exam likes to probe. Supported inputs include PDF, JPEG/JPG, PNG, BMP, TIFF, HEIF, and Office formats for some models. The free (F0) tier caps analysis at the first 2 pages of a document, while paid (S0) processes the whole file up to service limits. Training data must sit in Azure Blob Storage reached through a SAS URL (or a managed identity); the model itself never stores your source documents after training.

For security, prefer Microsoft Entra ID (managed identity) over the resource key so credentials are not embedded in code, and scope storage access with role-based access control. Use a customer-managed key when regulatory requirements demand control of encryption at rest.

On the Exam: A scenario describing a mailroom that receives invoices, receipts, and contracts and must process each correctly through one entry point is the classic classifier-plus-routing (composed-model) question — no manual pre-sorting step is needed. "Five documents" is the memorized minimum for custom training, and "varying layouts" always signals the neural build mode over template.

Test Your Knowledge

A scanned form contains checkboxes and a multi-column pricing table. You need to extract both the checkbox states and the table cells without training a model. Which model should you call?

prebuilt-read

prebuilt-layout

prebuilt-invoice

A custom neural model

Test Your Knowledge

You must train a custom model to read your company's purchase orders, which arrive in several different layouts from different suppliers. What should you choose and what is the minimum training set?

A template model with at least 50 documents

A neural model with at least 5 labeled documents

A composed model with exactly 100 documents

A prebuilt-invoice model with no training

Test Your Knowledge

An automated accounts-payable pipeline should post invoice totals straight to the ERP only when the extraction is reliable, and otherwise send the document to a human. Which property drives that branching?

The field's bounding polygon coordinates

The document's page count

Each extracted field's confidence score

The model's training document count

Test Your Knowledge

A mailroom solution receives invoices, receipts, and contracts mixed together and must extract the right fields from each through a single entry point, with no manual pre-sorting. What should you implement?

Run prebuilt-read on everything and parse the text yourself

A custom classifier combined with extraction models so documents are classified then routed

Three separate indexers, one per document type

A single template model trained on all three types together

Up Next

6.1 Azure OpenAI Service — Models and Deployment

Domain 2: Implement Generative AI Solutions (15-20%)

Azure AI Engineer Associate

Azure AI-102

5.3 Azure AI Document Intelligence

Key Takeaways

Current Edition and Tooling

Choosing the Right Model

Read vs Layout vs General

Prebuilt Models with Confidence

Custom Models: Template vs Neural

Classifiers and Routing Mixed Documents

Input Constraints and Security

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

5.3 Azure AI Document Intelligence

Key Takeaways

Current Edition and Tooling

Choosing the Right Model

Read vs Layout vs General

Prebuilt Models with Confidence

Custom Models: Template vs Neural

Classifiers and Routing Mixed Documents

Input Constraints and Security