High Availability and Disaster Recovery in Azure

Key Takeaways

  • High availability keeps an application running through localized failures using in-region redundancy such as Availability Sets and Availability Zones.
  • Disaster recovery restores service after a region-wide outage, typically with Azure Site Recovery and geo-redundant storage.
  • Two VMs spread across Availability Zones carry a 99.99% VM SLA; an Availability Set carries 99.95%; a single VM with Premium SSD carries 99.9%.
  • RPO measures acceptable data loss in time; RTO measures acceptable downtime in time.
  • Azure Backup uses a Recovery Services vault with soft delete on by default, retaining deleted backups 14 extra days against accidental or malicious loss.
Last updated: June 2026

Quick Answer: High availability (HA) survives localized failures inside a region (Availability Sets, Availability Zones, load balancers). Disaster recovery (DR) survives an entire region failure (Azure Site Recovery, geo-redundant storage). RPO is how much data you can lose; RTO is how much time you can be down. Your SLA target dictates which pattern you must use.

High availability inside a region

HA is about removing single points of failure so that an application keeps serving users when a disk, rack, or data center hiccups. Azure gives you increasingly strong (and increasingly expensive) options.

PatternProtects againstVM SLA
Single VM, Premium/Ultra SSDDisk failure99.9%
Availability Set (2+ VMs)Rack and host-update failures99.95%
Availability Zones (VMs in 2+ zones)Whole data center failure99.99%
Multi-region active/activeWhole region failure99.99%+

Availability Set mechanics matter for the exam. It spreads VMs across fault domains (separate racks with separate power and network) and update domains (groups patched at different times). So planned host maintenance never reboots every replica at once, and a single rack power loss never takes the whole set down.

Availability Zones are three or more physically separate facilities in a region, each with independent power, cooling, and networking. Placing two VMs in two zones reaches the 99.99% SLA because a single building failure cannot take both down.

Load balancing options

ServiceLayer / scopeUse
Azure Load BalancerLayer 4, regionalSpread TCP/UDP traffic across VMs in a region
Application GatewayLayer 7, regionalHTTP routing, WAF, health probes
Traffic ManagerDNS, globalRoute users to the closest healthy region
Front DoorLayer 7, globalGlobal HTTP load balancing and failover

Disaster recovery across regions

Azure Site Recovery (ASR)

Azure Site Recovery is disaster recovery as a service. It continuously replicates VMs to a secondary region, supports one-click failover and failback, and lets you run a non-disruptive DR drill against an isolated network so you can prove your plan works without touching production. Recovery plans order the failover (database tier before web tier, for example) and can run automation scripts. Typical RTO is minutes and RPO is seconds to a few minutes.

Azure Backup

Azure Backup stores recovery points in a Recovery Services vault with no backup infrastructure to manage.

What you can back upHow
Azure VMsApplication-consistent snapshots
Azure Files sharesShare snapshots
SQL Server / SAP HANA in Azure VMsDatabase backup, down to 15-minute RPO for SQL
On-premises files and foldersMARS agent or Azure Backup Server

Soft delete is enabled by default and keeps deleted backups for 14 additional days at no extra cost — a critical safety net against accidental deletion and ransomware. Backups can use geo-redundant storage so they survive a region loss.

RPO versus RTO

TermMeansExample
RPO (Recovery Point Objective)Maximum acceptable data loss, in timeRPO of 1 hour = lose at most 1 hour of data
RTO (Recovery Time Objective)Maximum acceptable downtime, in timeRTO of 4 hours = back online within 4 hours

Worked example: A bank requires RPO of 5 minutes and RTO of 30 minutes for its core ledger. Daily Azure Backup alone gives an RPO of up to 24 hours — far too lossy. ASR with continuous replication (RPO in seconds) plus a pre-tested recovery plan meets both targets. Tighter RPO/RTO always costs more, so do not over-engineer a workload whose business tolerates hours of loss.

HA versus DR: do not confuse them

The exam frequently offers a backup-or-replication scenario where one answer is HA and one is DR. Use this rule of thumb:

QuestionReach for
Keep an app up when one data center failsAvailability Zones (HA)
Survive planned host maintenance and rack lossAvailability Set (HA)
Recover the whole app after a region disasterAzure Site Recovery (DR)
Restore individual files, VMs, or databases from a point in timeAzure Backup (DR)
Keep storage available if its primary region failsGeo-redundant storage (GRS)

Backup is not the same as Site Recovery. Azure Backup creates recovery points you restore from after data loss or corruption; ASR keeps a near-real-time replica you fail over to during an outage. A mature design uses both: ASR for fast regional failover and Backup for long-term, point-in-time, ransomware-resistant recovery. Geo-redundant storage underpins both by copying data to the paired region hundreds of miles away.

On the Exam: RPO = data, RTO = time. HA solves in-region failures (zones, sets); DR solves region failures. "Replicate VMs to another region for failover" = Azure Site Recovery. "Recover an accidentally deleted backup" = soft delete. "Lowest data loss for a critical database" points to continuous replication, not nightly backup.

Test Your Knowledge

What does Azure Site Recovery primarily provide?

A
B
C
D
Test Your Knowledge

An application can tolerate losing at most 15 minutes of data but must be running again within 1 hour. Which two values describe these requirements?

A
B
C
D
Test Your Knowledge

Which configuration provides the 99.99% SLA for Azure Virtual Machines?

A
B
C
D