How many languages does the Azure AI Vision Read API support for printed text recognition?

164 languages. The Azure AI Vision Read API supports 164 languages for printed text recognition and 9 languages for handwritten text recognition. This makes it one of the most comprehensive OCR services available.

A company needs to extract invoice numbers, dates, and amounts from scanned invoices. Which service should they use?

Azure AI Document Intelligence. Azure AI Document Intelligence is designed for structured document processing. It has prebuilt models for invoices that extract specific fields (invoice number, date, amount, vendor) as key-value pairs. The Read API only extracts raw text without understanding document structure.

In the OCR response hierarchy, what is the correct order from top to bottom?

Pages → Blocks → Lines → Words. The Read API response is organized hierarchically: Pages → Blocks → Lines → Words. Each level includes bounding polygon coordinates. Pages contain blocks, blocks contain lines, and lines contain individual words with confidence scores.

Optical Character Recognition (OCR)

Quick Answer: The Azure AI Vision Read API extracts printed text (164 languages) and handwritten text (9 languages) from images and PDFs. Results are organized as pages → lines → words with bounding polygon coordinates. For structured document extraction (invoices, forms), use Document Intelligence instead.

Read API vs. Document Intelligence

Feature	Read API (Vision)	Document Intelligence
Best for	General text extraction from images	Structured document field extraction
Input	Images and PDFs	Documents, forms, invoices, receipts
Output	Raw text with positions	Structured key-value pairs and tables
Use case	Sign reading, label scanning, text digitization	Invoice processing, receipt extraction, form automation
Handwriting	Yes (9 languages)	Yes
Tables	No structured table extraction	Yes, full table extraction

Using the Read API

Synchronous (Small Images)

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint="https://my-vision.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

# Read text from an image
with open("document.jpg", "rb") as f:
    result = client.analyze(
        image_data=f.read(),
        visual_features=[VisualFeatures.READ]
    )

# Extract text
for block in result.read.blocks:
    for line in block.lines:
        print(f"Line: {line.text}")
        print(f"  Bounding polygon: {line.bounding_polygon}")
        for word in line.words:
            print(f"  Word: {word.text} "
                  f"(confidence: {word.confidence:.2f})")

Read API Response Structure

{
    "readResult": {
        "blocks": [
            {
                "lines": [
                    {
                        "text": "Hello World",
                        "boundingPolygon": [
                            {"x": 10, "y": 10},
                            {"x": 200, "y": 10},
                            {"x": 200, "y": 40},
                            {"x": 10, "y": 40}
                        ],
                        "words": [
                            {
                                "text": "Hello",
                                "boundingPolygon": [...],
                                "confidence": 0.99
                            },
                            {
                                "text": "World",
                                "boundingPolygon": [...],
                                "confidence": 0.98
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

Language Support

Category	Languages	Examples
Printed text	164 languages	English, Chinese, Arabic, Hindi, Japanese, Korean, Russian
Handwritten text	9 languages	English, Chinese Simplified, French, German, Italian, Japanese, Korean, Portuguese, Spanish

Best Practices for OCR Accuracy

Factor	Recommendation
Image resolution	Minimum 50x50 pixels; higher resolution = better accuracy
Contrast	High contrast between text and background
Alignment	Text should be roughly horizontal (Read API handles up to 40-degree skew)
Image format	JPEG, PNG, BMP, or TIFF; avoid heavily compressed JPEGs
PDF pages	Up to 2,000 pages per document
File size	Up to 500 MB for standard; 20 MB for Image Analysis Read

On the Exam: Know when to use the Read API (general text extraction) vs. Document Intelligence (structured document processing). If the question mentions invoices, receipts, or forms — the answer is Document Intelligence. If it mentions signs, labels, or general images — the answer is the Read API.

Azure AI Engineer Associate

3.4 Optical Character Recognition (OCR)

Key Takeaways

Optical Character Recognition (OCR)

Read API vs. Document Intelligence

Using the Read API

Synchronous (Small Images)

Read API Response Structure

Language Support

Best Practices for OCR Accuracy

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (15-20%)

3Domain 2: Implement Content Moderation Solutions (10-15%)

4Domain 3: Implement Computer Vision Solutions (15-20%)

5Domain 4: Implement Natural Language Processing Solutions (25-30%)

6Domain 5: Implement Knowledge Mining and Document Intelligence Solutions (10-15%)

7Domain 6: Implement Generative AI Solutions (10-15%)

8Exam Review: Cross-Domain Topics and Advanced Practice

3.4 Optical Character Recognition (OCR)

Key Takeaways

Optical Character Recognition (OCR)

Read API vs. Document Intelligence

Using the Read API

Synchronous (Small Images)

Read API Response Structure

Language Support

Best Practices for OCR Accuracy