How does Azure AI Document Intelligence differ from basic OCR?

Document Intelligence understands document structure and extracts key-value pairs and tables, not just text. Document Intelligence goes beyond basic OCR by understanding the STRUCTURE of documents. It extracts key-value pairs (e.g., "Invoice Number: 12345"), tables, selection marks, and document layout — not just the text content.

A company processes thousands of invoices from different vendors. They want to automatically extract invoice numbers, dates, totals, and line items. Which Document Intelligence approach should they use?

Pre-built Invoice model. The pre-built Invoice model is specifically trained to extract standard invoice fields (invoice number, date, total, line items, vendor). It requires no training and works with invoices from different vendors because it understands common invoice structures.

When should you train a custom Document Intelligence model instead of using a pre-built model?

When your documents have a unique format not covered by pre-built models. Custom models are needed when your documents have unique formats not covered by the pre-built models (invoices, receipts, IDs, W-2s, business cards). Examples include custom purchase orders, proprietary forms, or industry-specific documents.

Azure AI Document Intelligence

Quick Answer: Azure AI Document Intelligence extracts structured data from documents including text, tables, key-value pairs, and layout information. It provides pre-built models for invoices, receipts, ID documents, and tax forms. Custom models can be trained for your specific document types. It goes beyond basic OCR by understanding document structure.

What Is Document Intelligence?

Azure AI Document Intelligence (formerly Azure Form Recognizer) is a service that uses machine learning to extract text, key-value pairs, tables, and structures from documents. Unlike basic OCR which just reads text, Document Intelligence understands the structure of documents.

OCR vs. Document Intelligence

Capability	Basic OCR (Azure AI Vision)	Document Intelligence
Extract text from image	Yes	Yes
Identify text position	Yes	Yes
Extract key-value pairs	No	Yes ("Invoice Number: 12345")
Extract tables	No	Yes (rows, columns, cells)
Understand document structure	No	Yes (headers, paragraphs, sections)
Pre-built document models	No	Yes (invoices, receipts, IDs)
Custom document models	No	Yes (train on your documents)

Pre-Built Models

Document Intelligence provides pre-trained models for common document types:

Pre-Built Model	What It Extracts	Example Fields
Invoice	Invoice data	Invoice number, date, total, line items, vendor
Receipt	Receipt data	Merchant name, date, total, items, tax
ID Document	Identity card data	Name, date of birth, address, document number
W-2	Tax form data	Employee info, wages, tax withholdings
Business Card	Contact info	Name, company, phone, email, address
Health Insurance Card	Insurance data	Insurer, member ID, group number, plan

How Pre-Built Models Work

Submit a document (image or PDF) to the pre-built model endpoint
The model analyzes the document structure and content
Receive extracted fields with confidence scores

No training is needed — the models are pre-trained by Microsoft on millions of documents.

Custom Models

When pre-built models don't cover your document types, you can train custom models:

Custom Template Models

Trained on a specific document template/layout
Works best when documents have a consistent format
Requires 5+ sample documents per template
Ideal for: standardized forms, purchase orders, specific contracts

Custom Neural Models

Uses deep learning for more flexible extraction
Handles documents with varying layouts
Requires more training data (10+ documents)
Ideal for: documents with varied formats, mixed layouts

The Layout API

The Layout API extracts document structure without requiring a specific document model:

Extracted Element	Description
Text	All text content with positions
Tables	Table structure with rows, columns, and cells
Selection marks	Checkboxes and radio buttons (selected/unselected)
Paragraphs	Text grouped into paragraphs
Sections	Document sections and subsections
Headers/Footers	Page headers and footers
Page numbers	Page number identification
Barcodes	1D and 2D barcodes (QR codes, Code 128, etc.)

On the Exam: The Layout API is the most general Document Intelligence capability — it works on any document without training. Pre-built models are for specific document types. Custom models are for your organization's unique document formats. Know which to recommend for each scenario.

Microsoft Azure AI Fundamentals

3.4 Azure AI Document Intelligence

Key Takeaways

Azure AI Document Intelligence

What Is Document Intelligence?

OCR vs. Document Intelligence

Pre-Built Models

How Pre-Built Models Work

Custom Models

Custom Template Models

Custom Neural Models

The Layout API

Microsoft Azure AI Fundamentals

1Introduction

2Domain 1: Describe AI Workloads and Considerations (15-20%)

3Domain 2: Fundamental Principles of Machine Learning on Azure (20-25%)

4Domain 3: Computer Vision Workloads on Azure (15-20%)

5Domain 4: Natural Language Processing Workloads on Azure (15-20%)

6Domain 5: Generative AI Workloads on Azure (15-20%)

7Exam Review and Full-Length Practice Questions

3.4 Azure AI Document Intelligence

Key Takeaways

Azure AI Document Intelligence

What Is Document Intelligence?

OCR vs. Document Intelligence

Pre-Built Models

How Pre-Built Models Work

Custom Models

Custom Template Models

Custom Neural Models

The Layout API