5.4 Availability Zones, Sets, and VM Scale Sets

Key Takeaways

  • Availability zones protect against datacenter-level failure within a region, while availability sets spread VMs across fault and update domains inside a datacenter placement model.
  • VM Scale Sets provide a managed group of VM instances with uniform or flexible orchestration, autoscale, load balancing, and rolling upgrade patterns.
  • High availability design must include compute placement, load balancing, health probes, application state, data tier resilience, and operational update strategy.
  • Troubleshooting scale sets often involves instance health, autoscale rules, image or extension failures, quota, load balancer backend membership, and upgrade policy.
Last updated: May 2026

Availability Building Blocks

A single VM has planned and unplanned downtime risk. Azure provides several compute placement options to reduce that risk, but each solves a different problem. Availability zones are physically separate datacenter locations inside a supported region. Zone-redundant designs can survive a zone failure if the application, network path, and data tier are also resilient. Availability sets spread VMs across fault domains and update domains to reduce correlated host and maintenance impact. VM Scale Sets manage multiple VM instances as a group and are commonly used for stateless application tiers.

OptionProtects againstBest fitKey limitation
Availability zoneDatacenter or zone failureRegional HA for supported servicesNot every region or SKU supports every zone
Availability setHost rack and planned update groupingLegacy or non-zonal multi-VM appsMust be chosen at VM creation
VM Scale SetInstance failure and elastic demandHorizontal app tiersApp should tolerate instance replacement
Azure Site RecoveryRegional disaster recoveryDR to another regionFailover design and testing required

An availability set cannot be added to an existing VM after creation. To use one, deploy the VMs into the set at creation or rebuild the architecture. Zones are also selected at creation for zonal VMs. Moving a VM into a different zone is not a simple property edit.

Availability Zones

Zones are useful when the question mentions datacenter-level isolation, zone-redundant architecture, or high availability within a region. A zonal VM is pinned to one zone. A resilient application normally deploys multiple VMs across zones and places a zone-redundant load balancer, application gateway, or other traffic manager in front where appropriate. The storage design must also match the availability goal. Managed disks can use locally redundant or zone-redundant options depending on disk type and support.

Example CLI pattern:

az vm create \
  --resource-group rg-prod-web \
  --name vm-web-z1 \
  --image Ubuntu2204 \
  --zone 1 \
  --size Standard_D2s_v5 \
  --vnet-name vnet-prod \
  --subnet snet-web \
  --admin-username azureadmin \
  --ssh-key-values ~/.ssh/id_rsa.pub

If a deployment fails in a zone, check SKU availability in that zone, zonal quota, unsupported disk settings, and capacity constraints. A VM size might be available in the region but not in the requested zone.

Availability Sets

Availability sets use fault domains and update domains. Fault domains represent groups of hardware that share power and network infrastructure. Update domains represent groups that can be rebooted during planned maintenance. Azure distributes VMs in an availability set across these domains. The design reduces the chance that one host failure or maintenance wave affects all instances.

Availability sets are not a scaling feature. They do not create new VMs, balance traffic, or repair unhealthy apps. You still need a load balancer or application-level traffic distribution, and you need at least two VMs. Managed disks are recommended because Azure aligns disk placement with VM fault domains for managed availability sets.

Bicep example:

resource avset 'Microsoft.Compute/availabilitySets@2024-07-01' = {
  name: 'avset-web-prod'
  location: location
  sku: {
    name: 'Aligned'
  }
  properties: {
    platformFaultDomainCount: 2
    platformUpdateDomainCount: 5
  }
}

resource vm 'Microsoft.Compute/virtualMachines@2024-07-01' = {
  name: vmName
  location: location
  properties: {
    availabilitySet: {
      id: avset.id
    }
    hardwareProfile: {
      vmSize: 'Standard_D2s_v5'
    }
    storageProfile: storageProfile
    osProfile: osProfile
    networkProfile: networkProfile
  }
}

VM Scale Sets

A VM Scale Set is a group of VM instances managed together. Scale sets can use uniform orchestration, where instances are more identical and managed as a set, or flexible orchestration, which supports a more VM-like model and broader high availability patterns. Scale sets can integrate with Azure Load Balancer or Application Gateway. They can use autoscale rules based on metrics such as CPU, queue length, or custom signals.

Scale sets work best for stateless workloads or workloads that externalize state to databases, storage, queues, or caches. Instance replacement is normal. If the app stores unique state on the local OS disk, scale-in or repair can cause data loss. Use custom images, cloud-init, extensions, or configuration management to make instances reproducible.

CLI example:

az vmss create \
  --resource-group rg-prod-web \
  --name vmss-web-prod \
  --image Ubuntu2204 \
  --instance-count 3 \
  --vm-sku Standard_D2s_v5 \
  --upgrade-policy-mode automatic \
  --lb lb-web-prod \
  --backend-pool-name bepool-web

az monitor autoscale create \
  --resource-group rg-prod-web \
  --resource vmss-web-prod \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name autoscale-web \
  --min-count 2 --max-count 10 --count 3

az monitor autoscale rule create \
  --resource-group rg-prod-web \
  --autoscale-name autoscale-web \
  --condition "Percentage CPU > 70 avg 10m" \
  --scale out 1

Upgrade and Health Strategy

Scale set upgrade policy controls how model changes reach instances. Manual means you update instances yourself. Automatic applies changes without as much staged control. Rolling upgrades update batches and can use health probes to pause when instances become unhealthy. Automatic instance repair can replace unhealthy instances when health signals are configured.

RequirementRecommended feature
Add instances during high CPUAutoscale rule
Replace unhealthy scale set instancesAutomatic repairs with health signal
Safely roll image updatesRolling upgrade policy
Spread instances across zonesZone-aware scale set design
Put instances behind one frontend IPAzure Load Balancer backend pool

Troubleshooting Scenarios

Scenario 1: Autoscale did not add instances. Check the autoscale setting target resource, metric namespace, time aggregation, operator, duration, cooldown, minimum and maximum limits, and whether the scale set already reached max count. Also check quota. Autoscale cannot create instances if the subscription lacks regional family quota.

Scenario 2: Instances are created but do not receive traffic. Check backend pool membership, load balancer rule, health probe path and port, NSG effective rules, guest firewall, and whether the app is listening. A failed health probe keeps the instance out of rotation even if the VM is running.

Scenario 3: Rolling upgrade stops. Check instance health, extension status, application health extension, boot diagnostics, and model differences. A bad custom script can make every new instance unhealthy. Fix the model, then update or reimage affected instances.

Exam Design Logic

Choose availability zones when the question requires protection from datacenter failure in the same region. Choose availability sets when the question describes two or more traditional VMs that must be isolated across fault and update domains and zones are not the focus. Choose VM Scale Sets when the requirement is many similar instances, autoscaling, automatic repair, or consistent rolling deployment. Choose Azure Site Recovery when the requirement is regional disaster recovery for VMs.

Test Your Knowledge

You need a group of stateless web VMs to add instances automatically when CPU remains high. Which Azure feature should you use?

A
B
C
D
Test Your Knowledge

Which statement about availability sets is correct?

A
B
C
D
Test Your Knowledge

A scale set instance is running but receives no load-balanced traffic. What should you check first?

A
B
C
D