5.4 Availability Zones, Sets, and VM Scale Sets

Key Takeaways

Availability zones protect against datacenter-level failure within a region, while availability sets spread VMs across fault and update domains inside a datacenter placement model.
VM Scale Sets provide a managed group of VM instances with uniform or flexible orchestration, autoscale, load balancing, and rolling upgrade patterns.
High availability design must include compute placement, load balancing, health probes, application state, data tier resilience, and operational update strategy.
Troubleshooting scale sets often involves instance health, autoscale rules, image or extension failures, quota, load balancer backend membership, and upgrade policy.

Last updated: May 2026

Availability Building Blocks

A single VM has planned and unplanned downtime risk. Azure provides several compute placement options to reduce that risk, but each solves a different problem. Availability zones are physically separate datacenter locations inside a supported region. Zone-redundant designs can survive a zone failure if the application, network path, and data tier are also resilient. Availability sets spread VMs across fault domains and update domains to reduce correlated host and maintenance impact. VM Scale Sets manage multiple VM instances as a group and are commonly used for stateless application tiers.

Option	Protects against	Best fit	Key limitation
Availability zone	Datacenter or zone failure	Regional HA for supported services	Not every region or SKU supports every zone
Availability set	Host rack and planned update grouping	Legacy or non-zonal multi-VM apps	Must be chosen at VM creation
VM Scale Set	Instance failure and elastic demand	Horizontal app tiers	App should tolerate instance replacement
Azure Site Recovery	Regional disaster recovery	DR to another region	Failover design and testing required

An availability set cannot be added to an existing VM after creation. To use one, deploy the VMs into the set at creation or rebuild the architecture. Zones are also selected at creation for zonal VMs. Moving a VM into a different zone is not a simple property edit.

Availability Zones

Zones are useful when the question mentions datacenter-level isolation, zone-redundant architecture, or high availability within a region. A zonal VM is pinned to one zone. A resilient application normally deploys multiple VMs across zones and places a zone-redundant load balancer, application gateway, or other traffic manager in front where appropriate. The storage design must also match the availability goal. Managed disks can use locally redundant or zone-redundant options depending on disk type and support.

Example CLI pattern:

az vm create \
  --resource-group rg-prod-web \
  --name vm-web-z1 \
  --image Ubuntu2204 \
  --zone 1 \
  --size Standard_D2s_v5 \
  --vnet-name vnet-prod \
  --subnet snet-web \
  --admin-username azureadmin \
  --ssh-key-values ~/.ssh/id_rsa.pub

If a deployment fails in a zone, check SKU availability in that zone, zonal quota, unsupported disk settings, and capacity constraints. A VM size might be available in the region but not in the requested zone.

Availability Sets

Availability sets use fault domains and update domains. Fault domains represent groups of hardware that share power and network infrastructure. Update domains represent groups that can be rebooted during planned maintenance. Azure distributes VMs in an availability set across these domains. The design reduces the chance that one host failure or maintenance wave affects all instances.

Availability sets are not a scaling feature. They do not create new VMs, balance traffic, or repair unhealthy apps. You still need a load balancer or application-level traffic distribution, and you need at least two VMs. Managed disks are recommended because Azure aligns disk placement with VM fault domains for managed availability sets.

Bicep example:

resource avset 'Microsoft.Compute/availabilitySets@2024-07-01' = {
  name: 'avset-web-prod'
  location: location
  sku: {
    name: 'Aligned'
  }
  properties: {
    platformFaultDomainCount: 2
    platformUpdateDomainCount: 5
  }
}

resource vm 'Microsoft.Compute/virtualMachines@2024-07-01' = {
  name: vmName
  location: location
  properties: {
    availabilitySet: {
      id: avset.id
    }
    hardwareProfile: {
      vmSize: 'Standard_D2s_v5'
    }
    storageProfile: storageProfile
    osProfile: osProfile
    networkProfile: networkProfile
  }
}

VM Scale Sets

A VM Scale Set is a group of VM instances managed together. Scale sets can use uniform orchestration, where instances are more identical and managed as a set, or flexible orchestration, which supports a more VM-like model and broader high availability patterns. Scale sets can integrate with Azure Load Balancer or Application Gateway. They can use autoscale rules based on metrics such as CPU, queue length, or custom signals.

Scale sets work best for stateless workloads or workloads that externalize state to databases, storage, queues, or caches. Instance replacement is normal. If the app stores unique state on the local OS disk, scale-in or repair can cause data loss. Use custom images, cloud-init, extensions, or configuration management to make instances reproducible.

CLI example:

az vmss create \
  --resource-group rg-prod-web \
  --name vmss-web-prod \
  --image Ubuntu2204 \
  --instance-count 3 \
  --vm-sku Standard_D2s_v5 \
  --upgrade-policy-mode automatic \
  --lb lb-web-prod \
  --backend-pool-name bepool-web

az monitor autoscale create \
  --resource-group rg-prod-web \
  --resource vmss-web-prod \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name autoscale-web \
  --min-count 2 --max-count 10 --count 3

az monitor autoscale rule create \
  --resource-group rg-prod-web \
  --autoscale-name autoscale-web \
  --condition "Percentage CPU > 70 avg 10m" \
  --scale out 1

Upgrade and Health Strategy

Scale set upgrade policy controls how model changes reach instances. Manual means you update instances yourself. Automatic applies changes without as much staged control. Rolling upgrades update batches and can use health probes to pause when instances become unhealthy. Automatic instance repair can replace unhealthy instances when health signals are configured.

Requirement	Recommended feature
Add instances during high CPU	Autoscale rule
Replace unhealthy scale set instances	Automatic repairs with health signal
Safely roll image updates	Rolling upgrade policy
Spread instances across zones	Zone-aware scale set design
Put instances behind one frontend IP	Azure Load Balancer backend pool

Troubleshooting Scenarios

Scenario 1: Autoscale did not add instances. Check the autoscale setting target resource, metric namespace, time aggregation, operator, duration, cooldown, minimum and maximum limits, and whether the scale set already reached max count. Also check quota. Autoscale cannot create instances if the subscription lacks regional family quota.

Scenario 2: Instances are created but do not receive traffic. Check backend pool membership, load balancer rule, health probe path and port, NSG effective rules, guest firewall, and whether the app is listening. A failed health probe keeps the instance out of rotation even if the VM is running.

Scenario 3: Rolling upgrade stops. Check instance health, extension status, application health extension, boot diagnostics, and model differences. A bad custom script can make every new instance unhealthy. Fix the model, then update or reimage affected instances.

Exam Design Logic

Choose availability zones when the question requires protection from datacenter failure in the same region. Choose availability sets when the question describes two or more traditional VMs that must be isolated across fault and update domains and zones are not the focus. Choose VM Scale Sets when the requirement is many similar instances, autoscaling, automatic repair, or consistent rolling deployment. Choose Azure Site Recovery when the requirement is regional disaster recovery for VMs.

Test Your Knowledge

You need a group of stateless web VMs to add instances automatically when CPU remains high. Which Azure feature should you use?

Availability set only

VM Scale Set with autoscale rules

Managed disk snapshot

Resource lock

Test Your Knowledge

Which statement about availability sets is correct?

They can be added to any existing VM without redeployment

They distribute VMs across fault and update domains

They automatically create new VMs during high traffic

They provide cross-region disaster recovery

Test Your Knowledge

A scale set instance is running but receives no load-balanced traffic. What should you check first?

The load balancer backend pool, health probe, NSG, guest firewall, and application listener

The Microsoft Entra tenant display name

The renewal date of the administrator certification

The storage account blob lifecycle policy

Up Next

5.5 Move VMs Across Resource Groups, Subscriptions, and Regions

Continue learning

AZ-104 Microsoft Azure Administrator Study Guide

1Chapter 1: Exam Orientation and Microsoft Learn Source Control

2Chapter 2: Identity, RBAC, and Governance Foundations

3Chapter 3: Subscriptions, Policy, Costs, and Resource Organization

4Chapter 4: Storage Accounts, Access, and Data Protection

5Chapter 5: Compute, Virtual Machines, Scale Sets, and Bicep

6Chapter 6: Containers, App Service, and Platform Compute

7Chapter 7: Virtual Networking, Routing, and Secure Access

8Chapter 8: Name Resolution, Load Balancing, and Connectivity Troubleshooting

9Chapter 9: Monitoring, Alerting, Backup, and Site Recovery

10Chapter 10: AZ-104 Integration and Troubleshooting Case Labs

11Chapter 11: Final Review, Exam Experience, Renewal, and Career Path

5.4 Availability Zones, Sets, and VM Scale Sets

Key Takeaways

Availability Building Blocks

Availability Zones

Availability Sets

VM Scale Sets

Upgrade and Health Strategy

Troubleshooting Scenarios

Exam Design Logic