3.4 Optical Character Recognition (OCR)

Key Takeaways

  • Azure AI Vision Read API (OCR) extracts printed and handwritten text from images and documents supporting 164 languages for print and 9 for handwriting.
  • The Read API returns text organized hierarchically: pages → lines → words, each with bounding polygon coordinates.
  • For document-specific OCR (invoices, receipts, forms), use Azure AI Document Intelligence instead of the general Read API.
  • The Read API is available both as part of Image Analysis 4.0 (visual feature "Read") and as a standalone Read API endpoint.
  • OCR can process images (JPEG, PNG, BMP, TIFF) and PDF documents (up to 2,000 pages).
Last updated: March 2026

Optical Character Recognition (OCR)

Quick Answer: The Azure AI Vision Read API extracts printed text (164 languages) and handwritten text (9 languages) from images and PDFs. Results are organized as pages → lines → words with bounding polygon coordinates. For structured document extraction (invoices, forms), use Document Intelligence instead.

Read API vs. Document Intelligence

FeatureRead API (Vision)Document Intelligence
Best forGeneral text extraction from imagesStructured document field extraction
InputImages and PDFsDocuments, forms, invoices, receipts
OutputRaw text with positionsStructured key-value pairs and tables
Use caseSign reading, label scanning, text digitizationInvoice processing, receipt extraction, form automation
HandwritingYes (9 languages)Yes
TablesNo structured table extractionYes, full table extraction

Using the Read API

Synchronous (Small Images)

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint="https://my-vision.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

# Read text from an image
with open("document.jpg", "rb") as f:
    result = client.analyze(
        image_data=f.read(),
        visual_features=[VisualFeatures.READ]
    )

# Extract text
for block in result.read.blocks:
    for line in block.lines:
        print(f"Line: {line.text}")
        print(f"  Bounding polygon: {line.bounding_polygon}")
        for word in line.words:
            print(f"  Word: {word.text} "
                  f"(confidence: {word.confidence:.2f})")

Read API Response Structure

{
    "readResult": {
        "blocks": [
            {
                "lines": [
                    {
                        "text": "Hello World",
                        "boundingPolygon": [
                            {"x": 10, "y": 10},
                            {"x": 200, "y": 10},
                            {"x": 200, "y": 40},
                            {"x": 10, "y": 40}
                        ],
                        "words": [
                            {
                                "text": "Hello",
                                "boundingPolygon": [...],
                                "confidence": 0.99
                            },
                            {
                                "text": "World",
                                "boundingPolygon": [...],
                                "confidence": 0.98
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

Language Support

CategoryLanguagesExamples
Printed text164 languagesEnglish, Chinese, Arabic, Hindi, Japanese, Korean, Russian
Handwritten text9 languagesEnglish, Chinese Simplified, French, German, Italian, Japanese, Korean, Portuguese, Spanish

Best Practices for OCR Accuracy

FactorRecommendation
Image resolutionMinimum 50x50 pixels; higher resolution = better accuracy
ContrastHigh contrast between text and background
AlignmentText should be roughly horizontal (Read API handles up to 40-degree skew)
Image formatJPEG, PNG, BMP, or TIFF; avoid heavily compressed JPEGs
PDF pagesUp to 2,000 pages per document
File sizeUp to 500 MB for standard; 20 MB for Image Analysis Read

On the Exam: Know when to use the Read API (general text extraction) vs. Document Intelligence (structured document processing). If the question mentions invoices, receipts, or forms — the answer is Document Intelligence. If it mentions signs, labels, or general images — the answer is the Read API.

Test Your Knowledge

How many languages does the Azure AI Vision Read API support for printed text recognition?

A
B
C
D
Test Your Knowledge

A company needs to extract invoice numbers, dates, and amounts from scanned invoices. Which service should they use?

A
B
C
D
Test Your Knowledge

In the OCR response hierarchy, what is the correct order from top to bottom?

A
B
C
D