3.3 Azure AI Custom Vision
Key Takeaways
- Custom Vision trains image classification (multiclass or multilabel) and object detection models from your own labeled images with minimal data.
- Custom Vision uses two separate resources and keys: a Training resource and a Prediction resource; predictions go through the published iteration on the Prediction resource.
- Domain-specific bases (General, Food, Landmarks, Retail) and their compact variants set the starting accuracy and whether the model can be exported.
- Only compact-domain models export to TensorFlow, CoreML, ONNX, ONNX Float16, or Docker (Linux/Windows/ARM) for edge and offline use.
- Minimum is 5 images per tag (15+ recommended) up to 6 MB per image; object detection needs at least 15 boxes per tag and benefits from varied angles and lighting.
Quick Answer: Custom Vision trains image classification and object detection models on your own data. Pick a domain (use a compact domain when you need edge/offline export), train with as few as 5 images per tag, evaluate with precision/recall/mAP, publish an iteration, then call the Prediction resource. Training and prediction use separate resources and keys.
Two Resources, Two Keys
This split trips up many candidates. Custom Vision uses a Training resource (upload images, tag, train, evaluate, publish) and a Prediction resource (serve published iterations). The publish_iteration call needs the prediction resource's Azure ID, and inference uses the prediction key and endpoint — never the training key.
Classification vs. Object Detection
| Image Classification | Object Detection | |
|---|---|---|
| Task | Label the whole image | Locate + label objects |
| Output | Tag(s) + probability | Bounding box + tag + probability |
| Labeling | Tag the image | Draw boxes |
| Min data | 5 images/tag | ~15 boxes/tag |
Classification types: Multiclass assigns exactly one tag per image (cat or dog — mutually exclusive). Multilabel allows several tags at once (outdoor and sunny and beach).
Project Setup and Training
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from msrest.authentication import ApiKeyCredentials
trainer = CustomVisionTrainingClient(training_endpoint,
ApiKeyCredentials(in_headers={"Training-key": training_key}))
project = trainer.create_project("Quality Inspection",
domain_id=general_domain_id, classification_type="Multiclass")
good = trainer.create_tag(project.id, "Good")
bad = trainer.create_tag(project.id, "Defective")
with open("good1.jpg", "rb") as f:
trainer.create_images_from_data(project.id, f.read(), tag_ids=[good.id])
iteration = trainer.train_project(project.id) # quick training
Quick training finishes in minutes and is fine for prototyping; Advanced training lets you set a budget in hours and usually yields higher accuracy on hard datasets — choose it when a question stresses maximum accuracy and tolerates longer training.
Evaluating an Iteration
Custom Vision reports metrics at the probability threshold you set in the portal. Raising the threshold typically increases precision but lowers recall.
| Metric | Question it answers | Target |
|---|---|---|
| Precision | Of items predicted as a class, how many were correct? | > 90% |
| Recall | Of true items of a class, how many were found? | > 90% |
| AP | Area under the precision/recall curve, per tag | > 80% |
| mAP | Mean AP across all tags (detection quality) | > 80% |
Low precision with high recall means many false positives — raise the threshold or add negative examples. Low recall means misses — add more, more varied training images.
Publish and Predict
trainer.publish_iteration(project.id, iteration.id, "v1", prediction_resource_id)
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
predictor = CustomVisionPredictionClient(prediction_endpoint,
ApiKeyCredentials(in_headers={"Prediction-key": prediction_key}))
with open("test.jpg", "rb") as f:
r = predictor.classify_image(project.id, "v1", f.read())
for p in r.predictions:
print(p.tag_name, f"{p.probability:.2%}")
You must publish an iteration and reference it by its published name; an unpublished iteration cannot be called for prediction.
Domains and Edge Export
| Domain | Best for |
|---|---|
| General / General (compact) | Broad, mixed images |
| Food | Dishes, menus |
| Landmarks | Natural + built landmarks |
| Retail | Products on shelves |
Only compact domains export. Standard domains are cloud-only but more accurate.
| Export format | Target |
|---|---|
| TensorFlow / TF Lite | Android, Linux, IoT |
| CoreML | iOS, macOS |
| ONNX (incl. Float16) | Windows, ONNX runtimes |
| Docker (Linux/Windows/ARM) | Containers, Azure IoT Edge |
| Vision AI DevKit | Edge camera hardware |
export = trainer.export_iteration(project.id, iteration.id, platform="ONNX")
# download from export.download_uri
Training Data Best Practices
| Practice | Recommendation |
|---|---|
| Min images per tag | 5 (absolute), 15+ recommended |
| Object detection | At least 15 tagged boxes per object |
| Balance | Similar counts across tags |
| Variety | Different angles, lighting, backgrounds |
| Negatives | Add a "none/negative" set to cut false positives |
| Image size | >= 256x256 px, <= 6 MB each |
Worked Example
A factory tablet must flag defective parts offline on the line. Create a Multiclass project on General (compact), train Good vs. Defective with 15+ images each, confirm precision and recall above 90% at the chosen threshold, publish, then export to ONNX (or a Docker container for IoT Edge) so inference runs on-device with no cloud round trip.
Iterations, Retraining, and Quotas
Each time you click train, Custom Vision creates a new iteration — a versioned snapshot of the model. You can keep several iterations and compare their precision/recall before publishing one, and you can roll back by republishing an older iteration under the same name. Projects have limits worth remembering: a classification project supports many tags but the free (F0) tier caps total training images and project count, while the standard (S0) tier raises those ceilings. When a project hits its iteration limit, you must delete an old iteration before training again — a scenario the exam phrases as "training fails after many experiments."
Smart Labeler and Active Learning
After you publish a model and it starts predicting on real images, the Custom Vision portal stores those prediction images. Active learning surfaces images the model was unsure about so you can correct their tags and feed them back into training, steadily improving accuracy on real-world data. The Smart Labeler can pre-tag new uploads using the current model so a human only verifies rather than labels from scratch. Both features matter for the exam's "how do you improve a deployed model over time?" questions — the answer is to incorporate prediction images via active learning and retrain, not to start a brand-new project.
On the Exam: "Edge" or "offline" => compact domain + export. Confusing the training key with the prediction key, or calling an unpublished iteration, are the two most common failure scenarios. Improving a live model => active learning on stored prediction images, then retrain a new iteration.
What is the difference between Multiclass and Multilabel classification in Custom Vision?
You must deploy a Custom Vision model to run offline on Android tablets. What must be true of the model?
Which key and resource are used to call a published Custom Vision iteration for inference?