3.3 Azure AI Custom Vision

Key Takeaways

  • Custom Vision allows training image classification (single-label and multi-label) and object detection models with minimal training data.
  • The service offers domain-specific base models (General, Food, Landmarks, Retail) that provide better starting accuracy for specific scenarios.
  • Training iterations can use quick training (minutes) or advanced training (hours) — advanced training typically produces more accurate models.
  • Trained models can be exported as TensorFlow, CoreML, ONNX, or Docker containers for edge and offline deployment.
  • The Custom Vision portal provides a visual interface for uploading images, labeling data, training, and testing models.
Last updated: March 2026

Azure AI Custom Vision

Quick Answer: Custom Vision lets you train image classification and object detection models with your own labeled data. Choose from domain-specific base models, train with as few as 5 images per class, and export models as TensorFlow, CoreML, ONNX, or Docker containers for edge deployment.

Classification vs. Object Detection

FeatureImage ClassificationObject Detection
TaskAssign label(s) to the entire imageLocate and label specific objects within an image
OutputClass label + confidenceBounding box + class label + confidence
LabelingTag the whole imageDraw bounding boxes around objects
Use CasesProduct categorization, quality pass/failInventory counting, defect location, retail shelf analysis

Classification Types

  • Multiclass: Each image gets exactly one label (cat OR dog)
  • Multilabel: Each image can have multiple labels (outdoor AND sunny AND beach)

Project Setup and Training

Step 1: Create a Custom Vision Project

from azure.cognitiveservices.vision.customvision.training import (
    CustomVisionTrainingClient
)
from azure.cognitiveservices.vision.customvision.training.models import (
    Domain
)
from msrest.authentication import ApiKeyCredentials

credentials = ApiKeyCredentials(
    in_headers={"Training-key": "<training-key>"}
)
trainer = CustomVisionTrainingClient(
    endpoint="https://my-customvision.cognitiveservices.azure.com/",
    credentials=credentials
)

# Create a classification project
project = trainer.create_project(
    name="Product Quality Inspection",
    domain_id="general",  # or "food", "landmarks", "retail"
    classification_type="Multiclass"
)

Step 2: Add Tags and Upload Images

# Create tags
good_tag = trainer.create_tag(project.id, "Good")
defective_tag = trainer.create_tag(project.id, "Defective")

# Upload and tag images
import os
good_images_dir = "./training_data/good/"
for filename in os.listdir(good_images_dir):
    with open(os.path.join(good_images_dir, filename), "rb") as f:
        trainer.create_images_from_data(
            project.id,
            f.read(),
            tag_ids=[good_tag.id]
        )

Step 3: Train the Model

# Start training (quick training)
iteration = trainer.train_project(project.id)

# Wait for training to complete
while iteration.status != "Completed":
    iteration = trainer.get_iteration(project.id, iteration.id)
    print(f"Training status: {iteration.status}")
    time.sleep(10)

Step 4: Evaluate Performance

Key metrics for model evaluation:

MetricDescriptionIdeal Value
PrecisionOf predicted positives, how many are correct?> 90%
RecallOf actual positives, how many were detected?> 90%
AP (Average Precision)Area under precision-recall curve per class> 80%
mAPMean AP across all classes> 80%

Step 5: Publish and Use the Model

# Publish the trained iteration
publish_iteration_name = "v1"
prediction_resource_id = "/subscriptions/.../Microsoft.CognitiveServices/accounts/my-prediction"

trainer.publish_iteration(
    project.id,
    iteration.id,
    publish_iteration_name,
    prediction_resource_id
)

# Make predictions
from azure.cognitiveservices.vision.customvision.prediction import (
    CustomVisionPredictionClient
)

predictor = CustomVisionPredictionClient(
    endpoint="https://my-customvision.cognitiveservices.azure.com/",
    credentials=ApiKeyCredentials(
        in_headers={"Prediction-key": "<prediction-key>"}
    )
)

with open("test_image.jpg", "rb") as f:
    results = predictor.classify_image(
        project.id,
        publish_iteration_name,
        f.read()
    )

for prediction in results.predictions:
    print(f"{prediction.tag_name}: {prediction.probability:.2%}")

Domain-Specific Base Models

DomainBest ForOptimized For
GeneralWide variety of imagesGeneral purpose classification/detection
General (compact)Edge deploymentSmaller model size, mobile/edge devices
FoodFood and dish recognitionRestaurant menus, food delivery apps
LandmarksNatural and built landmarksTravel and tourism applications
RetailProduct recognition on shelvesRetail analytics, inventory management

On the Exam: Compact domains are specifically designed for export to edge devices. If a question mentions offline or edge deployment, choose a compact domain. Standard (non-compact) domains provide higher accuracy but can only be used via the cloud API.

Model Export for Edge Deployment

Custom Vision models trained with compact domains can be exported:

Export FormatTarget Platform
TensorFlowAndroid, Linux, IoT devices
CoreMLiOS and macOS applications
ONNXWindows, any ONNX-compatible runtime
Docker (Linux)Linux containers, Azure IoT Edge
Docker (Windows)Windows containers
OpenVINOIntel hardware (CPUs, VPUs, FPGAs)

Export Workflow

# Export the model in ONNX format
export = trainer.export_iteration(
    project.id,
    iteration.id,
    platform="ONNX"
)
# Download the exported model from export.download_uri

Training Data Best Practices

PracticeRecommendation
Minimum images per class5 (absolute minimum), 15+ recommended
Balanced classesSimilar number of images per class
Image varietyDifferent angles, lighting, backgrounds
Negative examplesInclude images that should NOT be classified as any tag
Image qualityAt least 256x256 pixels, representative of production images
Maximum image size6 MB per image
Test Your Knowledge

What is the key difference between Multiclass and Multilabel classification in Custom Vision?

A
B
C
D
Test Your Knowledge

Which Custom Vision domain should you use if you need to export the model for deployment on mobile devices?

A
B
C
D
Test Your Knowledge

After training a Custom Vision model, which metric tells you "of all predicted positives, how many were actually correct"?

A
B
C
D