5.3 Azure AI Document Intelligence
Key Takeaways
- Azure AI Document Intelligence (formerly Form Recognizer) extracts text, key-value pairs, tables, selection marks, and typed fields; the v4.0 / 2024-11-30 API and Document Intelligence Studio are the current tooling.
- Read and Layout require no training; Layout adds tables, selection marks, and structure, while prebuilt models (invoice, receipt, ID, W-2, 1099, business card, health-insurance, contract) return typed fields with per-field confidence.
- Custom models come in two build modes: template (consistent fixed layouts) and neural (varying layouts); both need a minimum of FIVE labeled training documents in a Blob container reached via SAS.
- In v4.0 you build a custom classifier and combine it with extraction models so a single call classifies a document then routes it to the right model (the modern replacement for v3.0 composed-model auto-routing).
- Every returned field carries a confidence score; production pipelines branch on a confidence threshold and route low-confidence extractions to human review.
Quick Answer: Document Intelligence (formerly Form Recognizer) extracts structured data from documents. Use Read for text, Layout for tables and structure, prebuilt models for common documents (invoices, receipts, IDs), and custom models for your own forms. Custom models are template (fixed layout) or neural (varying layout) and need at least five labeled samples. Combine a custom classifier with extraction models to auto-route mixed document types. Every field returns a confidence score.
Current Edition and Tooling
The service uses the v4.0 (2024-11-30 GA) REST API and SDKs, edited in Document Intelligence Studio (a no-code labeling portal). The Python client is DocumentIntelligenceClient, and analysis is asynchronous: begin_analyze_document returns a poller, and poller.result() blocks for the result. The old name Form Recognizer still appears in legacy docs and may show up as a distractor on the exam.
Choosing the Right Model
| Model | Training? | Returns |
|---|---|---|
| prebuilt-read | No | Text, language, handwriting detection |
| prebuilt-layout | No | Text + tables + selection marks + structure |
| prebuilt-invoice | No | Vendor, customer, totals, tax, line items |
| prebuilt-receipt | No | Merchant, date, items, subtotal, tax, tip, total |
| prebuilt-idDocument | No | Name, DOB, address, doc number, expiry |
| prebuilt-tax.us.w2 / 1099 / 1098 | No | Typed US tax-form fields |
| prebuilt-businessCard | No | Name, title, company, phone, email |
| prebuilt-healthInsuranceCard.us | No | Insurer, member ID, group, plan |
| Custom (template/neural) | Yes | Your labeled fields |
Decision shortcuts the exam rewards: only need raw text -> Read; need tables and checkboxes from any document -> Layout; the document is a standard invoice/receipt/ID -> the matching prebuilt; it is your company's unique form -> custom.
Read vs Layout vs General
poller = client.begin_analyze_document(model_id="prebuilt-layout", body=f)
result = poller.result()
for table in result.tables:
print(table.row_count, "x", table.column_count)
for page in result.pages:
for mark in page.selection_marks:
print(mark.state) # "selected" or "unselected"
Layout is the only no-training model that returns tables and selection marks (checkboxes), so any scenario mentioning checkbox state or table cells points to Layout, not Read.
Prebuilt Models with Confidence
Prebuilt models return strongly-typed fields, each with a confidence score between 0 and 1. Production code branches on a threshold and sends low-confidence results to human review.
poller = client.begin_analyze_document(model_id="prebuilt-invoice", body=f)
for doc in poller.result().documents:
total = doc.fields.get("InvoiceTotal")
if total and total.confidence >= 0.80:
post_to_erp(total.content)
else:
route_to_human_review(doc) # low confidence
Trap: the confidence value is a per-field reliability measure, not an accuracy guarantee. Exam scenarios that say "automatically post invoices over a reliability bar, otherwise queue for a person" are testing whether you branch on field.confidence against a threshold.
Custom Models: Template vs Neural
Both build modes need a minimum of five labeled training documents in a Blob container referenced by a Shared Access Signature (SAS) URL.
| Build mode | Layout assumption | Best for | Notes |
|---|---|---|---|
| template | Consistent, fixed positions | Standardized forms, applications | Faster, cheaper, single language per model |
| neural | Varying layouts | Contracts, letters, multi-vendor forms | Deep learning, better generalization, longer training |
poller = client.begin_build_document_model(
build_mode="neural", # or "template"
blob_container_url="https://acct.blob.core.windows.net/training?<SAS>",
model_id="po-neural-v1")
model = poller.result()
If documents arrive from many vendors with shifting layouts, the answer is neural; if they are one rigid in-house form, template is cheaper and faster.
Classifiers and Routing Mixed Documents
When a single pipeline receives mixed document types, you build a custom classifier (begin_build_classifier) and then combine the classifier with extraction models so one begin_classify_document (or a classifier-driven analyze) classifies first, then routes each document to the correct extraction model. This v4.0 classifier approach is the modern successor to v3.0 composed models, which combined multiple custom extraction models behind one model ID and auto-selected among them.
poller = client.begin_classify_document(
classifier_id="mailroom-classifier", body=document_bytes)
for doc in poller.result().documents:
print(doc.doc_type, doc.confidence) # e.g. "invoice" 0.97
| Need | Use |
|---|---|
| Sort/triage documents into categories only | Custom classifier |
| Extract typed fields from one known type | Prebuilt or custom extraction model |
| One endpoint that sorts THEN extracts mixed types | Classifier + extraction models (formerly composed model) |
Input Constraints and Security
Know the practical limits the exam likes to probe. Supported inputs include PDF, JPEG/JPG, PNG, BMP, TIFF, HEIF, and Office formats for some models. The free (F0) tier caps analysis at the first 2 pages of a document, while paid (S0) processes the whole file up to service limits. Training data must sit in Azure Blob Storage reached through a SAS URL (or a managed identity); the model itself never stores your source documents after training.
For security, prefer Microsoft Entra ID (managed identity) over the resource key so credentials are not embedded in code, and scope storage access with role-based access control. Use a customer-managed key when regulatory requirements demand control of encryption at rest.
On the Exam: A scenario describing a mailroom that receives invoices, receipts, and contracts and must process each correctly through one entry point is the classic classifier-plus-routing (composed-model) question — no manual pre-sorting step is needed. "Five documents" is the memorized minimum for custom training, and "varying layouts" always signals the neural build mode over template.
A scanned form contains checkboxes and a multi-column pricing table. You need to extract both the checkbox states and the table cells without training a model. Which model should you call?
You must train a custom model to read your company's purchase orders, which arrive in several different layouts from different suppliers. What should you choose and what is the minimum training set?
An automated accounts-payable pipeline should post invoice totals straight to the ERP only when the extraction is reliable, and otherwise send the document to a human. Which property drives that branching?
A mailroom solution receives invoices, receipts, and contracts mixed together and must extract the right fields from each through a single entry point, with no manual pre-sorting. What should you implement?