3.4 Optical Character Recognition (OCR)

Key Takeaways

The Azure AI Vision Read API extracts printed text in 164 languages and handwritten text in 9 languages from images and PDFs.
Read results are organized hierarchically as pages -> blocks -> lines -> words, each with bounding-polygon coordinates and per-word confidence.
Read runs two ways: synchronously inside the Image Analysis 4.0 'Read' feature, or asynchronously via the standalone Read 3.2 operation that returns an operation-location to poll.
Use the Read API for general text on signs, labels, and photos; switch to Document Intelligence when you need structured key-value fields and tables from invoices, receipts, or forms.
Standalone Read accepts files up to 500 MB and PDFs up to 2,000 pages; the Image Analysis Read feature caps input at 20 MB.

Last updated: June 2026

Quick Answer: The Azure AI Vision Read API extracts printed text (164 languages) and handwritten text (9 languages) from images and PDFs. Output is pages -> blocks -> lines -> words with bounding polygons and confidence. Use Read for general text; use Document Intelligence when you need structured fields and tables from invoices, receipts, or forms.

Read API vs. Document Intelligence

The single most repeated AI-102 OCR question is choosing between these two services. Read gives you text and positions; Document Intelligence gives you meaning — typed key-value pairs and table cells.

Aspect	Read API (Vision)	Document Intelligence
Best for	General text from images/PDFs	Structured field & table extraction
Output	Raw text + polygons	Key-value pairs, tables, typed fields
Typical input	Signs, labels, screenshots, books	Invoices, receipts, IDs, W-2s, forms
Tables	Not as structured cells	Full table extraction
Handwriting	Yes (9 languages)	Yes
Prebuilt models	No	Invoice, receipt, ID, layout, W-2, etc.

Mnemonic: if the prompt names invoices, receipts, forms, or specific fields, pick Document Intelligence. If it says read the text on a sign/label/photo, pick the Read API.

Two Ways to Call Read

1. Inside Image Analysis 4.0 (synchronous): request the Read visual feature; the text comes straight back in read.blocks. Input is capped at 20 MB.

2. Standalone Read 3.2 (asynchronous): POST .../vision/v3.2/read/analyze returns 202 Accepted with an Operation-Location header. You then GET that URL and poll the status field until it is succeeded before the text is available. This path handles multi-page PDFs and files up to 500 MB.

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(endpoint, AzureKeyCredential(key))
with open("sign.jpg", "rb") as f:
    result = client.analyze(image_data=f.read(),
                            visual_features=[VisualFeatures.READ])
for block in result.read.blocks:
    for line in block.lines:
        print(line.text)
        for word in line.words:
            print(word.text, round(word.confidence, 2))

Response Hierarchy (exam-critical)

Text is nested pages -> blocks -> lines -> words, and each level carries a bounding polygon. Words also carry a confidence score; lines do not.

{
  "readResult": {
    "blocks": [{
      "lines": [{
        "text": "Hello World",
        "boundingPolygon": [{"x":10,"y":10},{"x":200,"y":10},
                            {"x":200,"y":40},{"x":10,"y":40}],
        "words": [
          {"text":"Hello","confidence":0.99},
          {"text":"World","confidence":0.98}
        ]
      }]
    }]
  }
}

Reading order matters: to rebuild a paragraph you iterate lines within a block; to score per-token reliability you read word.confidence. Picking tagsResult or captionResult to find OCR text is a planted wrong answer — OCR always lives under readResult.

Language Support

Category	Count	Examples
Printed text	164 languages	English, Chinese, Arabic, Hindi, Japanese, Korean, Russian
Handwritten text	9 languages	English, Chinese Simplified, French, German, Italian, Japanese, Korean, Portuguese, Spanish

The service auto-detects language; you do not usually pass a language hint for Read. Note the asymmetry — far more print languages than handwriting languages — because the exam likes the "how many handwriting languages?" twist (the answer is 9, not 164).

File and Size Limits

Limit	Standalone Read 3.2	Image Analysis Read feature
Max file size	500 MB (S tier)	20 MB
Max PDF/TIFF pages	2,000	n/a (single image)
Min dimension	50 x 50 px	50 x 50 px
Formats	JPEG, PNG, BMP, TIFF, PDF	JPEG, PNG, BMP, GIF, TIFF, WEBP

Accuracy Best Practices

Factor	Recommendation
Resolution	Higher resolution = better; tiny text needs more pixels
Contrast	Maximize contrast between text and background
Skew	Keep text roughly horizontal; Read tolerates moderate rotation
Compression	Avoid heavily compressed JPEGs that smear glyphs
Glare/shadow	Even lighting; avoid reflections on glossy labels

Worked Example

A logistics app must digitize shipping labels photographed on phones (printed + occasional handwriting, single images): use the Image Analysis Read feature synchronously and read read.blocks[].lines[].text. But if the same app must also pull vendor, total, and line items from invoices, that part switches to Document Intelligence's prebuilt invoice model — Read alone cannot return typed fields.

Containers and Disconnected OCR

A recurring AI-102 theme is running OCR where data cannot leave the premises. The Read OCR capability ships as a Docker container you pull from the Microsoft Container Registry and run on-premises or at the edge; the container still requires billing connectivity to report usage, except under a disconnected (air-gapped) container commitment tier that you purchase specifically for fully offline operation.

When a scenario stresses "text extraction with no internet access" or "data residency forbids cloud calls," the answer is the Read OCR container, optionally the disconnected variant — not the cloud Read endpoint and not Document Intelligence's cloud service.

Cost and Performance Notes

Read is billed per transaction (per image, or per page for PDFs/TIFFs), so a 50-page PDF is 50 transactions. For latency, the synchronous Image Analysis Read feature is best for one small image in an interactive app, while the asynchronous Read operation is built for large multi-page documents where you can tolerate a poll. Sending an oversized image to the synchronous path returns an error rather than silently switching modes, so right-sizing the input and choosing the matching call pattern is part of correct design.

On the Exam: Memorize 164 print / 9 handwriting languages, the pages -> blocks -> lines -> words order, the 202 + Operation-Location poll pattern for async Read, the Read-vs-Document-Intelligence decision rule, and that disconnected OCR uses the Read container.

Test Your Knowledge

How many languages does the Azure AI Vision Read API support for handwritten text recognition?

9 languages

50 languages

164 languages

200 languages

Test Your Knowledge

A company needs to extract the invoice number, date, vendor, and line-item totals from scanned invoices as structured fields. Which service fits best?

Azure AI Vision Read API

Azure AI Custom Vision

Azure AI Language

Azure AI Document Intelligence

Test Your Knowledge

In the Read API response, what is the correct containment order from top to bottom?

Words -> Lines -> Blocks -> Pages

Pages -> Paragraphs -> Sentences -> Words

Lines -> Words -> Paragraphs -> Pages

Pages -> Blocks -> Lines -> Words

Test Your Knowledge

Using the asynchronous standalone Read operation, what must your code do after the initial POST returns 202 Accepted?

Read the text directly from the 202 response body

Resend the same POST until it returns 200

GET the URL in the Operation-Location header and poll status until it is succeeded

Switch to the Image Analysis Read feature

Up Next

3.5 Video Analysis and Spatial Analysis

Continue learning

Azure AI Engineer Associate

Azure AI-102

3.4 Optical Character Recognition (OCR)

Key Takeaways

Read API vs. Document Intelligence

Two Ways to Call Read

Response Hierarchy (exam-critical)

Language Support

File and Size Limits

Accuracy Best Practices

Worked Example

Containers and Disconnected OCR

Cost and Performance Notes

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

3.4 Optical Character Recognition (OCR)

Key Takeaways

Read API vs. Document Intelligence

Two Ways to Call Read

Response Hierarchy (exam-critical)

Language Support

File and Size Limits

Accuracy Best Practices

Worked Example

Containers and Disconnected OCR

Cost and Performance Notes