3.4 Optical Character Recognition (OCR)
Key Takeaways
- Azure AI Vision Read API (OCR) extracts printed and handwritten text from images and documents supporting 164 languages for print and 9 for handwriting.
- The Read API returns text organized hierarchically: pages → lines → words, each with bounding polygon coordinates.
- For document-specific OCR (invoices, receipts, forms), use Azure AI Document Intelligence instead of the general Read API.
- The Read API is available both as part of Image Analysis 4.0 (visual feature "Read") and as a standalone Read API endpoint.
- OCR can process images (JPEG, PNG, BMP, TIFF) and PDF documents (up to 2,000 pages).
Optical Character Recognition (OCR)
Quick Answer: The Azure AI Vision Read API extracts printed text (164 languages) and handwritten text (9 languages) from images and PDFs. Results are organized as pages → lines → words with bounding polygon coordinates. For structured document extraction (invoices, forms), use Document Intelligence instead.
Read API vs. Document Intelligence
| Feature | Read API (Vision) | Document Intelligence |
|---|---|---|
| Best for | General text extraction from images | Structured document field extraction |
| Input | Images and PDFs | Documents, forms, invoices, receipts |
| Output | Raw text with positions | Structured key-value pairs and tables |
| Use case | Sign reading, label scanning, text digitization | Invoice processing, receipt extraction, form automation |
| Handwriting | Yes (9 languages) | Yes |
| Tables | No structured table extraction | Yes, full table extraction |
Using the Read API
Synchronous (Small Images)
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
client = ImageAnalysisClient(
endpoint="https://my-vision.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)
# Read text from an image
with open("document.jpg", "rb") as f:
result = client.analyze(
image_data=f.read(),
visual_features=[VisualFeatures.READ]
)
# Extract text
for block in result.read.blocks:
for line in block.lines:
print(f"Line: {line.text}")
print(f" Bounding polygon: {line.bounding_polygon}")
for word in line.words:
print(f" Word: {word.text} "
f"(confidence: {word.confidence:.2f})")
Read API Response Structure
{
"readResult": {
"blocks": [
{
"lines": [
{
"text": "Hello World",
"boundingPolygon": [
{"x": 10, "y": 10},
{"x": 200, "y": 10},
{"x": 200, "y": 40},
{"x": 10, "y": 40}
],
"words": [
{
"text": "Hello",
"boundingPolygon": [...],
"confidence": 0.99
},
{
"text": "World",
"boundingPolygon": [...],
"confidence": 0.98
}
]
}
]
}
]
}
}
Language Support
| Category | Languages | Examples |
|---|---|---|
| Printed text | 164 languages | English, Chinese, Arabic, Hindi, Japanese, Korean, Russian |
| Handwritten text | 9 languages | English, Chinese Simplified, French, German, Italian, Japanese, Korean, Portuguese, Spanish |
Best Practices for OCR Accuracy
| Factor | Recommendation |
|---|---|
| Image resolution | Minimum 50x50 pixels; higher resolution = better accuracy |
| Contrast | High contrast between text and background |
| Alignment | Text should be roughly horizontal (Read API handles up to 40-degree skew) |
| Image format | JPEG, PNG, BMP, or TIFF; avoid heavily compressed JPEGs |
| PDF pages | Up to 2,000 pages per document |
| File size | Up to 500 MB for standard; 20 MB for Image Analysis Read |
On the Exam: Know when to use the Read API (general text extraction) vs. Document Intelligence (structured document processing). If the question mentions invoices, receipts, or forms — the answer is Document Intelligence. If it mentions signs, labels, or general images — the answer is the Read API.
How many languages does the Azure AI Vision Read API support for printed text recognition?
A company needs to extract invoice numbers, dates, and amounts from scanned invoices. Which service should they use?
In the OCR response hierarchy, what is the correct order from top to bottom?