Which Azure service provides pre-built image analysis capabilities including captioning, tagging, and object detection WITHOUT requiring custom model training?

Azure AI Vision. Azure AI Vision provides pre-built computer vision capabilities — you send an image and get results without any model training. Azure AI Custom Vision requires you to train your own model. Azure ML is a general-purpose ML platform. Document Intelligence focuses on document extraction.

A company wants to count the number of people entering and exiting their retail stores using existing security cameras. Which Azure AI Vision capability should they use?

Spatial Analysis. Spatial Analysis uses video streams from cameras to understand how people move through physical spaces, including counting people entering and exiting areas. This is specifically designed for real-time video analytics of physical spaces.

Which Azure AI Vision capability would you use to extract text from a photograph of a handwritten note?

OCR (Read API). The OCR Read API extracts both printed and handwritten text from images. It supports handwritten text recognition in multiple languages and returns the extracted text with position and confidence information.

What is the difference between Azure AI Vision and Azure AI Custom Vision?

Azure AI Vision provides pre-built models; Custom Vision lets you train your own models. Azure AI Vision provides pre-built computer vision models — send an image, get results instantly. Azure AI Custom Vision lets you train your own image classification and object detection models using your own labeled training images for domain-specific scenarios.

Azure AI Vision Service

Quick Answer: Azure AI Vision is the primary Azure service for image and video analysis. It provides pre-built models for image captioning, tagging, object detection, OCR, spatial analysis, and more. No custom model training is needed — you send an image to the API and receive analysis results.

What Is Azure AI Vision?

Azure AI Vision (formerly known as Azure Computer Vision) is a cloud-based service that provides pre-built computer vision capabilities. You send images or video to the service, and it returns structured analysis results.

Key Capabilities

Capability	Description	Example Output
Image captioning	Generate a natural language description of an image	"A dog playing fetch in a park"
Image tagging	Assign descriptive tags to an image	["dog", "outdoor", "park", "grass", "playing"]
Object detection	Identify and locate objects with bounding boxes	Object: "dog" at [100, 200, 150, 180]
Smart cropping	Automatically crop images around regions of interest	Focused crop around the main subject
People detection	Detect and locate people in images	Person locations with bounding boxes
Background removal	Separate foreground from background	Foreground mask or transparent background
OCR (Read)	Extract printed and handwritten text	"Invoice #12345, Date: March 2026"
Spatial analysis	Analyze video for people counting and movement	15 people in zone A, average dwell time 3 min

Image Analysis 4.0 API

The latest version of the Image Analysis API (4.0) is powered by Florence, a large-scale vision foundation model. Key improvements include:

Better captioning — more natural and accurate image descriptions
Dense captioning — descriptions for multiple regions in an image
Image retrieval — search through images using text queries (vector search)
Customization — add your own categories using few-shot learning (minimal training data)

How to Use Image Analysis

Create an Azure AI Vision resource in the Azure portal
Send an image to the REST API or use the SDK
Specify which visual features you want (caption, tags, objects, etc.)
Receive structured JSON results

Request: POST /imageanalysis:analyze?features=caption,tags,objects
Image: [photo of a park scene]

Response:
{
  "caption": "A golden retriever playing fetch in a sunny park",
  "tags": ["dog", "golden retriever", "park", "grass", "ball", "outdoor"],
  "objects": [
    {"name": "dog", "confidence": 0.97, "boundingBox": {...}},
    {"name": "ball", "confidence": 0.89, "boundingBox": {...}}
  ]
}

On the Exam: Know that Azure AI Vision provides PRE-BUILT capabilities — you do not need to train a model. Send an image, get results. If a question asks about training a custom image model, that is Azure AI Custom Vision (different service).

OCR with the Read API

The Read API is Azure AI Vision's OCR capability for extracting text from images and documents:

Supported Content

Printed text in 164+ languages
Handwritten text in English, Chinese, French, German, Italian, Japanese, Korean, Portuguese, Spanish
Mixed content — images with both printed and handwritten text
Document formats — JPEG, PNG, BMP, PDF, TIFF

Read API Process

Submit an image or document to the Read API
The service processes the image asynchronously
Retrieve results with extracted text, line positions, and word positions
Results include confidence scores for each extracted word

Common OCR Use Cases

Digitizing paper documents and forms
Reading license plates from camera images
Extracting data from receipts and invoices
Converting handwritten notes to searchable text
Indexing text in image-heavy documents

Spatial Analysis

Spatial Analysis uses video streams from cameras to understand how people move through physical spaces:

Capability	Description	Use Case
People counting	Count people entering/exiting an area	Retail foot traffic analysis
Social distancing	Measure distance between people	Workplace safety compliance
Zone dwell time	Track how long people stay in areas	Retail store layout optimization
Queue monitoring	Count people in lines and estimate wait times	Customer service improvement
Movement tracking	Track paths people take through a space	Facility layout planning

On the Exam: Spatial Analysis requires a camera connected to an Azure IoT Edge device. It processes video locally (edge computing) and sends only aggregated analytics to the cloud — not individual faces or video frames.

When to Use Azure AI Vision vs. Other Services

Scenario	Service to Use
Analyze a single image for tags, captions, objects	Azure AI Vision
Extract text from a scanned document	Azure AI Vision (Read API) or Azure AI Document Intelligence
Train a custom image classifier with your own categories	Azure AI Custom Vision
Detect and verify human faces	Azure AI Face
Generate images from text descriptions	Azure OpenAI Service (DALL-E / GPT Image)
Analyze video for scene detection and transcription	Azure AI Video Indexer
Count people in a physical space using video	Azure AI Vision (Spatial Analysis)
Extract structured data from forms and invoices	Azure AI Document Intelligence

Microsoft Azure AI Fundamentals

3.2 Azure AI Vision Service

Key Takeaways

Azure AI Vision Service

What Is Azure AI Vision?

Key Capabilities

Image Analysis 4.0 API

How to Use Image Analysis

OCR with the Read API

Supported Content

Read API Process

Common OCR Use Cases

Spatial Analysis

When to Use Azure AI Vision vs. Other Services

Microsoft Azure AI Fundamentals

1Introduction

2Domain 1: Describe AI Workloads and Considerations (15-20%)

3Domain 2: Fundamental Principles of Machine Learning on Azure (20-25%)

4Domain 3: Computer Vision Workloads on Azure (15-20%)

5Domain 4: Natural Language Processing Workloads on Azure (15-20%)

6Domain 5: Generative AI Workloads on Azure (15-20%)

7Exam Review and Full-Length Practice Questions

3.2 Azure AI Vision Service

Key Takeaways

Azure AI Vision Service

What Is Azure AI Vision?

Key Capabilities

Image Analysis 4.0 API

How to Use Image Analysis

OCR with the Read API

Supported Content

Read API Process

Common OCR Use Cases

Spatial Analysis

When to Use Azure AI Vision vs. Other Services