5.3 Azure AI Document Intelligence
Key Takeaways
- Azure AI Document Intelligence (formerly Form Recognizer) extracts structured data from documents including text, key-value pairs, tables, and specific fields.
- Prebuilt models handle common document types: invoices, receipts, business cards, ID documents, W-2 forms, health insurance cards, and more.
- Custom models can be trained for domain-specific documents — either template-based (fixed layout) or neural (varying layouts).
- Composed models combine multiple custom models into a single endpoint that automatically classifies incoming documents and routes them to the appropriate model.
- The Layout model extracts text, tables, selection marks, and document structure (headings, paragraphs, sections) without any training.
Azure AI Document Intelligence
Quick Answer: Document Intelligence (formerly Form Recognizer) extracts structured data from documents. Use prebuilt models for common documents (invoices, receipts, IDs), custom models for domain-specific documents, and composed models to automatically route different document types. The Layout API extracts text, tables, and structure without training.
Document Intelligence Models
Prebuilt Models
| Model | Document Type | Extracted Fields |
|---|---|---|
| Invoice | Invoices | Vendor, customer, amounts, line items, tax, total |
| Receipt | Receipts | Merchant, date, items, subtotal, tax, total, tip |
| ID Document | IDs, passports, driver licenses | Name, DOB, address, document number, expiration |
| Business Card | Business cards | Name, title, company, phone, email, address |
| W-2 | US tax form W-2 | Employee info, employer info, wages, taxes |
| Health Insurance | Health insurance cards | Insurer, member ID, group number, plan |
| Contract | Contracts | Parties, dates, terms |
| US Tax Forms | 1040, 1098, 1099 variants | All relevant tax fields |
General Models
| Model | Purpose | Training Required |
|---|---|---|
| Read | Extract text and language from documents | No |
| Layout | Extract text, tables, selection marks, structure | No |
| General Document | Extract key-value pairs from any document | No |
Using Prebuilt Models
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential
client = DocumentIntelligenceClient(
endpoint="https://my-doc-intel.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)
# Analyze an invoice
with open("invoice.pdf", "rb") as f:
poller = client.begin_analyze_document(
model_id="prebuilt-invoice",
body=f
)
result = poller.result()
for document in result.documents:
vendor = document.fields.get("VendorName")
if vendor:
print(f"Vendor: {vendor.content} "
f"(confidence: {vendor.confidence:.2f})")
invoice_total = document.fields.get("InvoiceTotal")
if invoice_total:
print(f"Total: {invoice_total.content} "
f"(confidence: {invoice_total.confidence:.2f})")
# Access line items
items = document.fields.get("Items")
if items:
for item in items.value:
description = item.value.get("Description")
amount = item.value.get("Amount")
print(f" Item: {description.content} = {amount.content}")
Layout Model
The Layout model extracts document structure without any training:
# Extract layout (text, tables, structure)
with open("document.pdf", "rb") as f:
poller = client.begin_analyze_document(
model_id="prebuilt-layout",
body=f
)
result = poller.result()
# Extract text by page
for page in result.pages:
print(f"Page {page.page_number}:")
for line in page.lines:
print(f" Line: {line.content}")
# Extract tables
for table in result.tables:
print(f"Table ({table.row_count} rows x {table.column_count} columns):")
for cell in table.cells:
print(f" [{cell.row_index},{cell.column_index}]: {cell.content}")
# Extract selection marks (checkboxes)
for page in result.pages:
for mark in page.selection_marks:
print(f" Checkbox at ({mark.polygon}): {mark.state}")
# state: "selected" or "unselected"
Custom Models
Template Models (Fixed Layout)
- Train on documents with a consistent layout
- Best for: Forms, applications, structured questionnaires
- Minimum: 5 training documents
Neural Models (Varying Layout)
- Handle documents with varying layouts and formats
- Best for: Contracts, letters, documents with unpredictable structures
- Minimum: 5 training documents (recommended: 15+)
- Better generalization than template models
Training a Custom Model
# Start custom model training
poller = client.begin_build_document_model(
build_mode="template", # or "neural"
blob_container_url="https://storage.blob.core.windows.net/training-data?<SAS>",
description="Custom purchase order model"
)
model = poller.result()
print(f"Model ID: {model.model_id}")
print(f"Fields: {model.doc_types}")
Composed Models
Composed models combine multiple custom models into a single endpoint:
# Create a composed model from existing models
poller = client.begin_compose_document_model(
component_model_ids=["invoice-model", "receipt-model", "po-model"],
description="Unified document processing model"
)
composed_model = poller.result()
print(f"Composed model ID: {composed_model.model_id}")
# When you analyze a document with the composed model,
# it automatically classifies and routes to the correct component model
How Composed Models Work
- A document is submitted to the composed model endpoint
- The composed model classifies the document type
- The appropriate component model processes the document
- Results include the document type and extracted fields
On the Exam: Composed models are the answer when a scenario describes needing to process multiple document types through a single endpoint. The composed model handles routing automatically — no pre-classification step is needed.
Document Classification
Document classification models categorize documents without extracting fields:
| Feature | Description |
|---|---|
| Purpose | Sort documents into categories before processing |
| Training | Provide labeled document examples per category |
| Output | Document type classification with confidence score |
| Use case | Mail sorting, document routing, triage |
# Classify a document
poller = client.begin_classify_document(
classifier_id="my-document-classifier",
body=document_bytes
)
result = poller.result()
for document in result.documents:
print(f"Type: {document.doc_type}")
print(f"Confidence: {document.confidence:.2f}")
What was Azure AI Document Intelligence previously called?
Which model should you use to extract vendor name, invoice total, and line items from an invoice?
What is the purpose of a composed model in Document Intelligence?
What is the key difference between template and neural custom models?