High Availability and Disaster Recovery in Azure
Key Takeaways
- High availability ensures applications remain accessible during planned and unplanned outages using redundancy within a region.
- Disaster recovery ensures business continuity during major outages by replicating workloads to a secondary region.
- Azure Site Recovery provides disaster recovery as a service by replicating VMs between Azure regions or from on-premises to Azure.
- Azure Backup provides a simple, secure backup solution for protecting data in Azure and on-premises.
- SLA requirements drive architecture decisions: 99.9% = single region; 99.99% = availability zones; 99.999% = multi-region.
High Availability and Disaster Recovery in Azure
Quick Answer: High availability = staying up during localized failures (Availability Zones, Scale Sets). Disaster recovery = recovering from major regional outages (Azure Site Recovery, geo-redundant storage). SLA requirements drive your architecture choices.
High Availability (HA) Strategies
| Strategy | Protects Against | SLA Impact |
|---|---|---|
| Single VM with Premium SSD | Disk failures | 99.9% |
| Availability Set | Rack-level failures in a data center | 99.95% |
| Availability Zones | Data center failures within a region | 99.99% |
| Multi-region deployment | Entire region failures | 99.99%+ |
Availability Zones Recap
- Three or more physically separate data centers within a region
- Each zone has independent power, cooling, and networking
- Zone-redundant services automatically replicate across zones
- Deploying VMs across availability zones provides 99.99% SLA
Load Balancing for HA
- Azure Load Balancer — Distributes traffic across VMs in a region
- Application Gateway — Layer 7 load balancing with health probes
- Traffic Manager — DNS-based failover between regions
- Front Door — Global HTTP load balancing with automatic failover
Disaster Recovery (DR) Strategies
Azure Site Recovery (ASR)
Azure Site Recovery provides disaster recovery as a service (DRaaS):
| Feature | Description |
|---|---|
| Replication | Continuously replicate VMs between Azure regions or from on-premises to Azure |
| Failover | One-click failover to the secondary region during a disaster |
| Failback | Return to the primary region after the disaster is resolved |
| Recovery plans | Define the order of failover and include custom scripts |
| Testing | Run DR drills without impacting production |
| RTO/RPO | Recovery Time Objective: minutes; Recovery Point Objective: seconds to minutes |
ASR supports:
- Azure VM to Azure VM (region to region)
- On-premises VMware/Hyper-V to Azure
- Physical servers to Azure
Azure Backup
Azure Backup provides a simple, cost-effective, and secure backup solution:
| What Can Be Backed Up | How |
|---|---|
| Azure VMs | Full VM backup with application-consistent snapshots |
| Azure Files | File share snapshots |
| SQL Server in Azure VMs | Database backup with 15-minute RPO |
| Azure Blob Storage | Operational backup for blobs |
| On-premises files | MARS agent or Azure Backup Server |
Key features:
- No infrastructure — No backup server to manage (built into Azure)
- Unlimited scaling — No limit on backup data volume
- Geo-redundant storage — Backups replicated to a paired region
- Encryption — Data encrypted at rest and in transit
- Long-term retention — Keep backups for years (compliance)
- Soft delete — Deleted backups retained for 14 additional days to prevent accidental loss
RPO and RTO
| Concept | Definition | Example |
|---|---|---|
| RPO (Recovery Point Objective) | Maximum acceptable data loss measured in time | RPO of 1 hour means you can afford to lose up to 1 hour of data |
| RTO (Recovery Time Objective) | Maximum acceptable downtime before recovery | RTO of 4 hours means systems must be restored within 4 hours |
On the Exam: RPO is about DATA loss (how much data can you afford to lose). RTO is about TIME (how quickly must you recover). Lower RPO/RTO requirements = higher cost and complexity.
What does Azure Site Recovery provide?
What is RPO (Recovery Point Objective)?
Which Azure Backup feature helps prevent accidental deletion of backup data?