3.4 Azure AI Document Intelligence
Key Takeaways
- Azure AI Document Intelligence (formerly Form Recognizer) extracts structured data from documents — text, tables, key-value pairs, and layout information.
- Pre-built models handle common document types: invoices, receipts, ID documents, W-2 tax forms, and business cards without training.
- Custom models can be trained to extract data from your organization's specific document formats (purchase orders, contracts, custom forms).
- The Layout API extracts text, tables, selection marks, and document structure (paragraphs, sections, headers) from any document.
- Document Intelligence goes beyond OCR by understanding document STRUCTURE — it knows a value belongs to a specific field, not just what text is present.
Azure AI Document Intelligence
Quick Answer: Azure AI Document Intelligence extracts structured data from documents including text, tables, key-value pairs, and layout information. It provides pre-built models for invoices, receipts, ID documents, and tax forms. Custom models can be trained for your specific document types. It goes beyond basic OCR by understanding document structure.
What Is Document Intelligence?
Azure AI Document Intelligence (formerly Azure Form Recognizer) is a service that uses machine learning to extract text, key-value pairs, tables, and structures from documents. Unlike basic OCR which just reads text, Document Intelligence understands the structure of documents.
OCR vs. Document Intelligence
| Capability | Basic OCR (Azure AI Vision) | Document Intelligence |
|---|---|---|
| Extract text from image | Yes | Yes |
| Identify text position | Yes | Yes |
| Extract key-value pairs | No | Yes ("Invoice Number: 12345") |
| Extract tables | No | Yes (rows, columns, cells) |
| Understand document structure | No | Yes (headers, paragraphs, sections) |
| Pre-built document models | No | Yes (invoices, receipts, IDs) |
| Custom document models | No | Yes (train on your documents) |
Pre-Built Models
Document Intelligence provides pre-trained models for common document types:
| Pre-Built Model | What It Extracts | Example Fields |
|---|---|---|
| Invoice | Invoice data | Invoice number, date, total, line items, vendor |
| Receipt | Receipt data | Merchant name, date, total, items, tax |
| ID Document | Identity card data | Name, date of birth, address, document number |
| W-2 | Tax form data | Employee info, wages, tax withholdings |
| Business Card | Contact info | Name, company, phone, email, address |
| Health Insurance Card | Insurance data | Insurer, member ID, group number, plan |
How Pre-Built Models Work
- Submit a document (image or PDF) to the pre-built model endpoint
- The model analyzes the document structure and content
- Receive extracted fields with confidence scores
No training is needed — the models are pre-trained by Microsoft on millions of documents.
Custom Models
When pre-built models don't cover your document types, you can train custom models:
Custom Template Models
- Trained on a specific document template/layout
- Works best when documents have a consistent format
- Requires 5+ sample documents per template
- Ideal for: standardized forms, purchase orders, specific contracts
Custom Neural Models
- Uses deep learning for more flexible extraction
- Handles documents with varying layouts
- Requires more training data (10+ documents)
- Ideal for: documents with varied formats, mixed layouts
The Layout API
The Layout API extracts document structure without requiring a specific document model:
| Extracted Element | Description |
|---|---|
| Text | All text content with positions |
| Tables | Table structure with rows, columns, and cells |
| Selection marks | Checkboxes and radio buttons (selected/unselected) |
| Paragraphs | Text grouped into paragraphs |
| Sections | Document sections and subsections |
| Headers/Footers | Page headers and footers |
| Page numbers | Page number identification |
| Barcodes | 1D and 2D barcodes (QR codes, Code 128, etc.) |
On the Exam: The Layout API is the most general Document Intelligence capability — it works on any document without training. Pre-built models are for specific document types. Custom models are for your organization's unique document formats. Know which to recommend for each scenario.
How does Azure AI Document Intelligence differ from basic OCR?
A company processes thousands of invoices from different vendors. They want to automatically extract invoice numbers, dates, totals, and line items. Which Document Intelligence approach should they use?
When should you train a custom Document Intelligence model instead of using a pre-built model?