3.5 Video Analysis and Spatial Analysis
Key Takeaways
- Azure AI Video Indexer extracts insights from video including transcripts, faces, scenes, topics, brands, emotions, and visual text.
- Video Indexer supports audio insights (transcription, translation, speaker identification) and visual insights (face detection, scene detection, OCR).
- Spatial Analysis uses Azure AI Vision to analyze people movement in real-time video feeds from cameras — counting people, detecting occupancy, and tracking movement.
- Spatial Analysis runs as a Docker container on Azure IoT Edge devices for real-time, low-latency processing at the edge.
- Spatial Analysis operations include PersonCount, PersonCrossingLine, PersonCrossingPolygon, PersonDistance, and PersonZoneDwellTime.
Video Analysis and Spatial Analysis
Quick Answer: Video Indexer extracts rich insights from videos (transcripts, faces, scenes, topics). Spatial Analysis analyzes real-time video feeds from cameras to count people, detect occupancy, and track movement patterns. Spatial Analysis runs as an edge container on IoT Edge.
Azure AI Video Indexer
Video Indexer analyzes video and audio content to extract structured insights:
Audio Insights
| Insight | Description |
|---|---|
| Transcription | Speech-to-text for all spoken content |
| Translation | Translate transcripts to other languages |
| Speaker identification | Identify and distinguish individual speakers |
| Sentiment analysis | Detect emotional tone of spoken content |
| Audio effects | Detect clapping, silence, crowd noise |
| Topic detection | Extract discussion topics from the transcript |
Visual Insights
| Insight | Description |
|---|---|
| Face detection | Detect and identify faces throughout the video |
| Scene detection | Identify scene changes and segment boundaries |
| Shot detection | Detect camera shots and transitions |
| OCR | Extract text visible in video frames (signs, captions) |
| Object detection | Identify objects in video frames |
| Brand detection | Detect brand logos and mentions |
| Thumbnail extraction | Generate representative thumbnails |
Video Indexer API Usage
import requests
# Upload a video for indexing
upload_url = (
f"https://api.videoindexer.ai/"
f"{location}/Accounts/{account_id}/Videos"
f"?name=my-video&privacy=Private"
f"&accessToken={access_token}"
)
files = {"file": open("video.mp4", "rb")}
response = requests.post(upload_url, files=files)
video_id = response.json()["id"]
# Get video insights
insights_url = (
f"https://api.videoindexer.ai/"
f"{location}/Accounts/{account_id}/Videos/{video_id}/Index"
f"?accessToken={access_token}"
)
insights = requests.get(insights_url).json()
# Access transcript
for transcript in insights["videos"][0]["insights"]["transcript"]:
print(f"[{transcript['speakerName']}]: {transcript['text']}")
Spatial Analysis
Spatial Analysis uses computer vision to analyze real-time video streams and understand how people move through physical spaces.
Spatial Analysis Operations
| Operation | Description | Use Case |
|---|---|---|
| PersonCount | Count people in a defined zone | Store occupancy limits |
| PersonCrossingLine | Detect when people cross a virtual line | Entry/exit counting |
| PersonCrossingPolygon | Detect when people enter/exit a polygon zone | Restricted area monitoring |
| PersonDistance | Measure distance between people | Social distancing compliance |
| PersonZoneDwellTime | Measure how long people stay in a zone | Queue wait time analysis |
Deployment Architecture
[Camera(s)] → [Azure IoT Edge Device]
└── [Spatial Analysis Container]
├── Process video frames locally
├── Detect people and track movement
└── Send aggregated events to cloud
└── [Azure IoT Hub] → [Azure Stream Analytics] → [Dashboard]
Configuration Example (JSON)
{
"version": 1,
"type": "cognitiveservices.vision.spatialanalysis-personcrossingline",
"input": {
"source": {
"type": "rtsp",
"uri": "rtsp://camera-ip:554/stream"
}
},
"parameters": {
"lines": [
{
"name": "entrance-line",
"start": {"x": 0.1, "y": 0.5},
"end": {"x": 0.9, "y": 0.5}
}
],
"threshold": 16,
"focus": "footprint"
}
}
Privacy and Responsible AI
Spatial Analysis is designed with privacy in mind:
- No facial recognition: People are represented as bounding boxes, not identified by face
- No image storage: Video frames are processed in memory and immediately discarded
- Edge processing: Video stays on the local device — only aggregated counts/events are sent to the cloud
- Configurable zones: Only monitor specific areas, not the entire camera view
On the Exam: Know that Spatial Analysis runs on IoT Edge (not in the cloud), processes video locally for privacy, and does NOT perform facial recognition. Questions may test these privacy-by-design features.
Where does Azure AI Vision Spatial Analysis process video streams?
Which Spatial Analysis operation would you use to measure how long customers wait in a checkout line?
Which of the following insights can Azure AI Video Indexer extract? (Select the best answer)