3.3 Azure AI Custom Vision and Face Services
Key Takeaways
- Azure AI Custom Vision lets you train custom image classification and object detection models using your own labeled images — as few as 5-10 images per category.
- Custom Vision provides a portal for uploading images, adding labels/tags, training the model, and testing predictions — all without writing code.
- Azure AI Face detects human faces in images, supporting face verification (is this the same person?) and face identification (who is this person?).
- Face verification compares two faces to determine if they belong to the same person (1:1 matching); face identification matches a face against a group of known people (1:many matching).
- Microsoft restricts access to facial recognition capabilities — identification and verification features require an approved application due to responsible AI concerns.
Azure AI Custom Vision and Face Services
Quick Answer: Custom Vision trains custom image classification and object detection models from your own labeled images. Face API detects faces and supports verification (is this the same person?) and identification (who is this person?). Microsoft restricts facial recognition features due to responsible AI concerns.
Azure AI Custom Vision
Azure AI Custom Vision lets you train your own image classification and object detection models without machine learning expertise. You provide labeled images, and the service handles model training.
Custom Vision vs. Azure AI Vision
| Feature | Azure AI Vision | Custom Vision |
|---|---|---|
| Model | Pre-built, general-purpose | Custom, trained on your data |
| Training required | No | Yes (upload and label images) |
| Categories | General (thousands of built-in) | Custom (your own categories) |
| Use case | General image analysis | Domain-specific classification |
| Example | "This is a dog" | "This is a Labrador with hip dysplasia" |
When to Use Custom Vision
Use Custom Vision when the pre-built Azure AI Vision model does not recognize the specific categories you need:
- Manufacturing: Identify specific product defects (scratch, dent, discoloration)
- Agriculture: Classify crop diseases (blight, rust, mildew)
- Retail: Identify your specific product brands and variants
- Healthcare: Classify medical images by condition (requires domain expertise)
- Wildlife: Identify specific species in camera trap images
Training a Custom Vision Model
Step 1: Create a Custom Vision resource
- Create the resource in the Azure portal
- Choose between training and prediction resources
Step 2: Create a project
- Select the project type: Image Classification or Object Detection
- For classification, choose: Multilabel (multiple tags per image) or Multiclass (one tag per image)
Step 3: Upload and tag images
- Upload training images for each category
- Add tags (labels) to each image
- Minimum: 5 images per tag (recommended: 50+ for better results)
Step 4: Train the model
- Click "Train" to start the training process
- Choose Quick Training (fast, less accurate) or Advanced Training (slower, more accurate)
Step 5: Evaluate and test
- Review precision, recall, and AP (Average Precision) metrics
- Test with new images to verify predictions
Step 6: Publish and use
- Publish the trained model as a prediction endpoint
- Call the endpoint from your application to classify new images
Custom Vision Project Types
| Project Type | Task | Output |
|---|---|---|
| Image Classification (Multiclass) | Assign ONE tag per image | "This is a Labrador" |
| Image Classification (Multilabel) | Assign MULTIPLE tags per image | "outdoor", "sunny", "park" |
| Object Detection | Locate objects with bounding boxes | "Labrador at position (x, y, w, h)" |
Azure AI Face Service
Azure AI Face is a service for detecting, recognizing, and analyzing human faces in images.
Face Detection
Detects human faces in an image and returns:
- Face location — bounding box coordinates
- Face landmarks — key points (eyes, nose, mouth, jaw)
- Head pose — rotation angles
- Face mask — whether the person is wearing a face mask
- Blur and noise — quality assessment of the face image
Face Verification (1:1)
Compares two face images to determine if they belong to the same person.
| Input | Output | Use Case |
|---|---|---|
| Photo A + Photo B | Same person: Yes/No + confidence | Identity verification for secure login |
Example: A banking app compares a selfie taken during login with the photo on file to verify the user's identity.
Face Identification (1:Many)
Matches one face against a group of known faces to identify who it is.
| Input | Output | Use Case |
|---|---|---|
| Photo + Person Group | Identified person + confidence | Employee access control |
Example: An office building uses facial identification to grant access — the system compares a person's face against all registered employees.
Responsible AI Restrictions on Face API
Microsoft has implemented significant restrictions on the Face API due to responsible AI concerns:
| Feature | Status | Reason |
|---|---|---|
| Face detection | Available (general access) | Low risk — detects faces without identification |
| Face verification | Restricted (requires approval) | Medium risk — used for identity verification |
| Face identification | Restricted (requires approval) | High risk — used for surveillance and tracking |
| Emotion detection | Retired | Unreliable — facial expressions don't reliably indicate emotions |
| Age/gender inference | Retired | Biased — inaccurate and potentially harmful |
On the Exam: Know that Microsoft has RETIRED emotion detection and age/gender inference from the Face API. Face detection is generally available, but verification and identification require an approved application. This is a direct application of responsible AI principles.
Face Liveness Detection
A newer capability that determines whether a face in the camera is a real, live person or a spoofing attempt (photo, video, or mask). This prevents attacks where someone holds up a photo to bypass facial verification.
| Detection Method | What It Catches |
|---|---|
| Passive liveness | Photos held to camera, video replays |
| Active liveness | Requires user to perform an action (blink, turn head) |
On the Exam: Face liveness detection is an important security feature. Know that it prevents spoofing attacks where someone uses a photo or video instead of their real face. This is relevant to both the reliability/safety and privacy/security responsible AI principles.
A manufacturing company needs to classify product images into three categories specific to their products: "Grade A", "Grade B", and "Defective". Which Azure service should they use?
A banking app needs to verify that the person taking a selfie during login is the same person whose photo is on file. Which Face API capability is this?
Why did Microsoft retire emotion detection from the Azure AI Face service?
What is the minimum number of images per tag required to train a Custom Vision model?
What does face liveness detection prevent?