5.3 Azure AI Document Intelligence

Key Takeaways

  • Azure AI Document Intelligence (formerly Form Recognizer) extracts text, key-value pairs, tables, selection marks, and typed fields; the v4.0 / 2024-11-30 API and Document Intelligence Studio are the current tooling.
  • Read and Layout require no training; Layout adds tables, selection marks, and structure, while prebuilt models (invoice, receipt, ID, W-2, 1099, business card, health-insurance, contract) return typed fields with per-field confidence.
  • Custom models come in two build modes: template (consistent fixed layouts) and neural (varying layouts); both need a minimum of FIVE labeled training documents in a Blob container reached via SAS.
  • In v4.0 you build a custom classifier and combine it with extraction models so a single call classifies a document then routes it to the right model (the modern replacement for v3.0 composed-model auto-routing).
  • Every returned field carries a confidence score; production pipelines branch on a confidence threshold and route low-confidence extractions to human review.
Last updated: June 2026

Quick Answer: Document Intelligence (formerly Form Recognizer) extracts structured data from documents. Use Read for text, Layout for tables and structure, prebuilt models for common documents (invoices, receipts, IDs), and custom models for your own forms. Custom models are template (fixed layout) or neural (varying layout) and need at least five labeled samples. Combine a custom classifier with extraction models to auto-route mixed document types. Every field returns a confidence score.

Current Edition and Tooling

The service uses the v4.0 (2024-11-30 GA) REST API and SDKs, edited in Document Intelligence Studio (a no-code labeling portal). The Python client is DocumentIntelligenceClient, and analysis is asynchronous: begin_analyze_document returns a poller, and poller.result() blocks for the result. The old name Form Recognizer still appears in legacy docs and may show up as a distractor on the exam.

Choosing the Right Model

ModelTraining?Returns
prebuilt-readNoText, language, handwriting detection
prebuilt-layoutNoText + tables + selection marks + structure
prebuilt-invoiceNoVendor, customer, totals, tax, line items
prebuilt-receiptNoMerchant, date, items, subtotal, tax, tip, total
prebuilt-idDocumentNoName, DOB, address, doc number, expiry
prebuilt-tax.us.w2 / 1099 / 1098NoTyped US tax-form fields
prebuilt-businessCardNoName, title, company, phone, email
prebuilt-healthInsuranceCard.usNoInsurer, member ID, group, plan
Custom (template/neural)YesYour labeled fields

Decision shortcuts the exam rewards: only need raw text -> Read; need tables and checkboxes from any document -> Layout; the document is a standard invoice/receipt/ID -> the matching prebuilt; it is your company's unique form -> custom.

Read vs Layout vs General

poller = client.begin_analyze_document(model_id="prebuilt-layout", body=f)
result = poller.result()
for table in result.tables:
    print(table.row_count, "x", table.column_count)
for page in result.pages:
    for mark in page.selection_marks:
        print(mark.state)   # "selected" or "unselected"

Layout is the only no-training model that returns tables and selection marks (checkboxes), so any scenario mentioning checkbox state or table cells points to Layout, not Read.

Prebuilt Models with Confidence

Prebuilt models return strongly-typed fields, each with a confidence score between 0 and 1. Production code branches on a threshold and sends low-confidence results to human review.

poller = client.begin_analyze_document(model_id="prebuilt-invoice", body=f)
for doc in poller.result().documents:
    total = doc.fields.get("InvoiceTotal")
    if total and total.confidence >= 0.80:
        post_to_erp(total.content)
    else:
        route_to_human_review(doc)        # low confidence

Trap: the confidence value is a per-field reliability measure, not an accuracy guarantee. Exam scenarios that say "automatically post invoices over a reliability bar, otherwise queue for a person" are testing whether you branch on field.confidence against a threshold.

Custom Models: Template vs Neural

Both build modes need a minimum of five labeled training documents in a Blob container referenced by a Shared Access Signature (SAS) URL.

Build modeLayout assumptionBest forNotes
templateConsistent, fixed positionsStandardized forms, applicationsFaster, cheaper, single language per model
neuralVarying layoutsContracts, letters, multi-vendor formsDeep learning, better generalization, longer training
poller = client.begin_build_document_model(
    build_mode="neural",   # or "template"
    blob_container_url="https://acct.blob.core.windows.net/training?<SAS>",
    model_id="po-neural-v1")
model = poller.result()

If documents arrive from many vendors with shifting layouts, the answer is neural; if they are one rigid in-house form, template is cheaper and faster.

Classifiers and Routing Mixed Documents

When a single pipeline receives mixed document types, you build a custom classifier (begin_build_classifier) and then combine the classifier with extraction models so one begin_classify_document (or a classifier-driven analyze) classifies first, then routes each document to the correct extraction model. This v4.0 classifier approach is the modern successor to v3.0 composed models, which combined multiple custom extraction models behind one model ID and auto-selected among them.

poller = client.begin_classify_document(
    classifier_id="mailroom-classifier", body=document_bytes)
for doc in poller.result().documents:
    print(doc.doc_type, doc.confidence)   # e.g. "invoice" 0.97
NeedUse
Sort/triage documents into categories onlyCustom classifier
Extract typed fields from one known typePrebuilt or custom extraction model
One endpoint that sorts THEN extracts mixed typesClassifier + extraction models (formerly composed model)

Input Constraints and Security

Know the practical limits the exam likes to probe. Supported inputs include PDF, JPEG/JPG, PNG, BMP, TIFF, HEIF, and Office formats for some models. The free (F0) tier caps analysis at the first 2 pages of a document, while paid (S0) processes the whole file up to service limits. Training data must sit in Azure Blob Storage reached through a SAS URL (or a managed identity); the model itself never stores your source documents after training.

For security, prefer Microsoft Entra ID (managed identity) over the resource key so credentials are not embedded in code, and scope storage access with role-based access control. Use a customer-managed key when regulatory requirements demand control of encryption at rest.

On the Exam: A scenario describing a mailroom that receives invoices, receipts, and contracts and must process each correctly through one entry point is the classic classifier-plus-routing (composed-model) question — no manual pre-sorting step is needed. "Five documents" is the memorized minimum for custom training, and "varying layouts" always signals the neural build mode over template.

Test Your Knowledge

A scanned form contains checkboxes and a multi-column pricing table. You need to extract both the checkbox states and the table cells without training a model. Which model should you call?

A
B
C
D
Test Your Knowledge

You must train a custom model to read your company's purchase orders, which arrive in several different layouts from different suppliers. What should you choose and what is the minimum training set?

A
B
C
D
Test Your Knowledge

An automated accounts-payable pipeline should post invoice totals straight to the ERP only when the extraction is reliable, and otherwise send the document to a human. Which property drives that branching?

A
B
C
D
Test Your Knowledge

A mailroom solution receives invoices, receipts, and contracts mixed together and must extract the right fields from each through a single entry point, with no manual pre-sorting. What should you implement?

A
B
C
D