Which type of machine learning does clustering use?

Unsupervised learning with unlabeled data. Clustering is an unsupervised learning technique. It uses unlabeled data (features only, no predefined categories) and discovers natural groupings in the data. The algorithm finds the groups on its own without being told what the categories should be.

A marketing team wants to find natural groups in their customer data based on purchasing behavior, without any predefined categories. Which technique should they use?

Clustering. Clustering is the correct choice because the team wants to discover natural groups WITHOUT predefined categories. Classification requires predefined categories (labels), regression predicts numbers, and anomaly detection finds outliers.

What is the key difference between classification and clustering?

Classification uses predefined categories while clustering discovers groups from data. The fundamental difference is that classification uses supervised learning with predefined categories (labels known in advance), while clustering uses unsupervised learning to discover natural groups in unlabeled data (groups are not known in advance).

In K-Means clustering, what does "K" represent?

The number of clusters to create. In K-Means clustering, K represents the number of clusters the algorithm will create. You specify K before running the algorithm, and it will partition the data into exactly K groups by minimizing the distance between data points and their cluster centers (centroids).

Clustering Models

Quick Answer: Clustering is an unsupervised learning technique that groups similar data points together without predefined labels. K-Means is the most common clustering algorithm. Unlike classification (which assigns to known categories), clustering discovers new groups in the data.

What Is Clustering?

Clustering is an unsupervised machine learning technique that identifies natural groups (clusters) in data. The model has NO labeled data — it discovers patterns and groupings on its own.

Key Characteristics of Clustering

No labels required — only features (inputs), no predefined categories
Discovers natural groups — the algorithm finds the groupings
Number of groups may be unknown — you may need to experiment to find the right number
Similar items are grouped together — items within a cluster are more similar to each other than to items in other clusters

K-Means Clustering

K-Means is the most widely used clustering algorithm and the one most commonly referenced on the AI-900 exam.

How K-Means Works

Choose K — Decide how many clusters you want (K = number of clusters)
Initialize centroids — Place K random points as initial cluster centers
Assign points — Assign each data point to the nearest centroid
Update centroids — Move each centroid to the center of its assigned points
Repeat — Continue assigning and updating until centroids stop moving (convergence)

K-Means Example: Customer Segmentation

A retail company has purchase data for 10,000 customers (features: annual spending, purchase frequency, average order value). They run K-Means with K=3 and discover:

Cluster	Characteristics	Label (assigned after clustering)
Cluster 1	High spending, frequent purchases, high order value	"Premium Customers"
Cluster 2	Medium spending, moderate frequency	"Regular Customers"
Cluster 3	Low spending, infrequent purchases, low order value	"Casual Shoppers"

Note: The labels "Premium", "Regular", and "Casual" are NOT part of the algorithm — they are human interpretations applied after clustering.

Clustering vs. Classification

This is one of the most commonly tested distinctions on the AI-900:

Aspect	Classification	Clustering
Learning type	Supervised	Unsupervised
Labels	Required (known categories)	Not used
Goal	Assign to known categories	Discover unknown groups
Categories	Predefined before training	Discovered during training
Example	"Is this email spam?"	"What groups exist in my customer data?"
Output	Known class label	Cluster assignment (group number)

On the Exam: The key differentiator is whether categories are KNOWN in advance. If the question says "categorize into billing, technical, or general" — that is classification (categories are predefined). If the question says "find natural groups in customer data" — that is clustering (groups are discovered).

Common Clustering Use Cases

Use Case	Data	Discovered Clusters
Customer segmentation	Purchase history, demographics	Customer types (e.g., premium, budget, seasonal)
Document grouping	Text features	Topic groups (e.g., sports, politics, technology)
Image grouping	Image features	Visual similarity groups
Anomaly detection	Any features	Normal clusters + outliers
Gene expression	Gene activity levels	Groups of co-regulated genes
Market research	Survey responses	Consumer preference segments

Evaluating Clustering Models

Since there are no labels to compare against, clustering evaluation uses different approaches:

Method	What It Measures
Silhouette score	How similar items are to their own cluster vs. other clusters (-1 to 1, higher is better)
Within-cluster distance	How tightly packed items are within each cluster (lower is better)
Between-cluster distance	How separated clusters are from each other (higher is better)
Visual inspection	Plot the clusters and assess if groupings make intuitive sense

On the Exam: You do not need to calculate clustering metrics. Know that good clustering produces well-separated, cohesive groups where items within a cluster are similar to each other and different from items in other clusters.

Microsoft Azure AI Fundamentals

2.4 Clustering Models

Key Takeaways

Clustering Models

What Is Clustering?

Key Characteristics of Clustering

K-Means Clustering

How K-Means Works

K-Means Example: Customer Segmentation

Clustering vs. Classification

Common Clustering Use Cases

Evaluating Clustering Models

Microsoft Azure AI Fundamentals

1Introduction

2Domain 1: Describe AI Workloads and Considerations (15-20%)

3Domain 2: Fundamental Principles of Machine Learning on Azure (20-25%)

4Domain 3: Computer Vision Workloads on Azure (15-20%)

5Domain 4: Natural Language Processing Workloads on Azure (15-20%)

6Domain 5: Generative AI Workloads on Azure (15-20%)

7Exam Review and Full-Length Practice Questions

2.4 Clustering Models

Key Takeaways

Clustering Models

What Is Clustering?

Key Characteristics of Clustering

K-Means Clustering

How K-Means Works

K-Means Example: Customer Segmentation

Clustering vs. Classification

Common Clustering Use Cases

Evaluating Clustering Models