3.3 Azure AI Custom Vision
Key Takeaways
- Custom Vision allows training image classification (single-label and multi-label) and object detection models with minimal training data.
- The service offers domain-specific base models (General, Food, Landmarks, Retail) that provide better starting accuracy for specific scenarios.
- Training iterations can use quick training (minutes) or advanced training (hours) — advanced training typically produces more accurate models.
- Trained models can be exported as TensorFlow, CoreML, ONNX, or Docker containers for edge and offline deployment.
- The Custom Vision portal provides a visual interface for uploading images, labeling data, training, and testing models.
Azure AI Custom Vision
Quick Answer: Custom Vision lets you train image classification and object detection models with your own labeled data. Choose from domain-specific base models, train with as few as 5 images per class, and export models as TensorFlow, CoreML, ONNX, or Docker containers for edge deployment.
Classification vs. Object Detection
| Feature | Image Classification | Object Detection |
|---|---|---|
| Task | Assign label(s) to the entire image | Locate and label specific objects within an image |
| Output | Class label + confidence | Bounding box + class label + confidence |
| Labeling | Tag the whole image | Draw bounding boxes around objects |
| Use Cases | Product categorization, quality pass/fail | Inventory counting, defect location, retail shelf analysis |
Classification Types
- Multiclass: Each image gets exactly one label (cat OR dog)
- Multilabel: Each image can have multiple labels (outdoor AND sunny AND beach)
Project Setup and Training
Step 1: Create a Custom Vision Project
from azure.cognitiveservices.vision.customvision.training import (
CustomVisionTrainingClient
)
from azure.cognitiveservices.vision.customvision.training.models import (
Domain
)
from msrest.authentication import ApiKeyCredentials
credentials = ApiKeyCredentials(
in_headers={"Training-key": "<training-key>"}
)
trainer = CustomVisionTrainingClient(
endpoint="https://my-customvision.cognitiveservices.azure.com/",
credentials=credentials
)
# Create a classification project
project = trainer.create_project(
name="Product Quality Inspection",
domain_id="general", # or "food", "landmarks", "retail"
classification_type="Multiclass"
)
Step 2: Add Tags and Upload Images
# Create tags
good_tag = trainer.create_tag(project.id, "Good")
defective_tag = trainer.create_tag(project.id, "Defective")
# Upload and tag images
import os
good_images_dir = "./training_data/good/"
for filename in os.listdir(good_images_dir):
with open(os.path.join(good_images_dir, filename), "rb") as f:
trainer.create_images_from_data(
project.id,
f.read(),
tag_ids=[good_tag.id]
)
Step 3: Train the Model
# Start training (quick training)
iteration = trainer.train_project(project.id)
# Wait for training to complete
while iteration.status != "Completed":
iteration = trainer.get_iteration(project.id, iteration.id)
print(f"Training status: {iteration.status}")
time.sleep(10)
Step 4: Evaluate Performance
Key metrics for model evaluation:
| Metric | Description | Ideal Value |
|---|---|---|
| Precision | Of predicted positives, how many are correct? | > 90% |
| Recall | Of actual positives, how many were detected? | > 90% |
| AP (Average Precision) | Area under precision-recall curve per class | > 80% |
| mAP | Mean AP across all classes | > 80% |
Step 5: Publish and Use the Model
# Publish the trained iteration
publish_iteration_name = "v1"
prediction_resource_id = "/subscriptions/.../Microsoft.CognitiveServices/accounts/my-prediction"
trainer.publish_iteration(
project.id,
iteration.id,
publish_iteration_name,
prediction_resource_id
)
# Make predictions
from azure.cognitiveservices.vision.customvision.prediction import (
CustomVisionPredictionClient
)
predictor = CustomVisionPredictionClient(
endpoint="https://my-customvision.cognitiveservices.azure.com/",
credentials=ApiKeyCredentials(
in_headers={"Prediction-key": "<prediction-key>"}
)
)
with open("test_image.jpg", "rb") as f:
results = predictor.classify_image(
project.id,
publish_iteration_name,
f.read()
)
for prediction in results.predictions:
print(f"{prediction.tag_name}: {prediction.probability:.2%}")
Domain-Specific Base Models
| Domain | Best For | Optimized For |
|---|---|---|
| General | Wide variety of images | General purpose classification/detection |
| General (compact) | Edge deployment | Smaller model size, mobile/edge devices |
| Food | Food and dish recognition | Restaurant menus, food delivery apps |
| Landmarks | Natural and built landmarks | Travel and tourism applications |
| Retail | Product recognition on shelves | Retail analytics, inventory management |
On the Exam: Compact domains are specifically designed for export to edge devices. If a question mentions offline or edge deployment, choose a compact domain. Standard (non-compact) domains provide higher accuracy but can only be used via the cloud API.
Model Export for Edge Deployment
Custom Vision models trained with compact domains can be exported:
| Export Format | Target Platform |
|---|---|
| TensorFlow | Android, Linux, IoT devices |
| CoreML | iOS and macOS applications |
| ONNX | Windows, any ONNX-compatible runtime |
| Docker (Linux) | Linux containers, Azure IoT Edge |
| Docker (Windows) | Windows containers |
| OpenVINO | Intel hardware (CPUs, VPUs, FPGAs) |
Export Workflow
# Export the model in ONNX format
export = trainer.export_iteration(
project.id,
iteration.id,
platform="ONNX"
)
# Download the exported model from export.download_uri
Training Data Best Practices
| Practice | Recommendation |
|---|---|
| Minimum images per class | 5 (absolute minimum), 15+ recommended |
| Balanced classes | Similar number of images per class |
| Image variety | Different angles, lighting, backgrounds |
| Negative examples | Include images that should NOT be classified as any tag |
| Image quality | At least 256x256 pixels, representative of production images |
| Maximum image size | 6 MB per image |
What is the key difference between Multiclass and Multilabel classification in Custom Vision?
Which Custom Vision domain should you use if you need to export the model for deployment on mobile devices?
After training a Custom Vision model, which metric tells you "of all predicted positives, how many were actually correct"?