2.3 Multi-AZ and Multi-Region Architectures
Key Takeaways
- Multi-AZ deploys resources across Availability Zones within a single Region for high availability against AZ-level failures.
- Multi-Region deploys resources across AWS Regions for disaster recovery, low latency globally, and compliance with data sovereignty requirements.
- RDS Multi-AZ provides automatic failover with a synchronous standby replica; Aurora Multi-AZ has up to 15 read replicas with automatic failover.
- S3 replicates data across at least 3 AZs automatically; Cross-Region Replication requires explicit configuration.
- Route 53 health checks and failover routing enable automatic traffic redirection during Regional outages.
Multi-AZ and Multi-Region Architectures
Quick Answer: Multi-AZ = high availability within a Region (protects against AZ failure). Multi-Region = disaster recovery + global performance (protects against Region failure). Most production workloads need Multi-AZ at minimum. Use Multi-Region for mission-critical applications with RPO/RTO requirements.
Multi-AZ Architecture
What Multi-AZ Means
Deploying resources across multiple Availability Zones within a single Region ensures that if one AZ experiences an outage, your application continues running in other AZs.
Multi-AZ by Service
| Service | Multi-AZ Behavior |
|---|---|
| EC2 + ALB | Deploy instances across 2+ AZs; ALB routes to healthy instances |
| RDS Multi-AZ | Synchronous standby replica in another AZ; automatic failover (60-120 seconds) |
| Aurora | Data replicated 6 ways across 3 AZs; up to 15 read replicas with auto-failover |
| ElastiCache | Multi-AZ with automatic failover for Redis (not Memcached clusters) |
| EFS | Automatically stores data across multiple AZs |
| S3 | Automatically replicates across 3+ AZs within a Region |
| DynamoDB | Automatically replicates across 3 AZs within a Region |
| NAT Gateway | Deploy one per AZ for true Multi-AZ resilience |
RDS Multi-AZ Deep Dive
| Feature | RDS Multi-AZ Instance | RDS Multi-AZ Cluster |
|---|---|---|
| Replicas | 1 standby (not readable) | 2 readable standbys |
| Replication | Synchronous | Semi-synchronous |
| Failover time | 60-120 seconds | ~35 seconds |
| Read scaling | No (standby not readable) | Yes (standbys handle reads) |
| Engines | All RDS engines | MySQL, PostgreSQL |
On the Exam: "The database must have automatic failover with minimal downtime" → RDS Multi-AZ. "The database must support read scaling AND automatic failover" → Aurora or RDS Multi-AZ Cluster.
Multi-Region Architecture
When to Use Multi-Region
| Requirement | Multi-Region Needed? |
|---|---|
| Survive a full AWS Region outage | Yes |
| Sub-100ms latency for global users | Yes |
| Data sovereignty (data must stay in country) | Yes |
| RPO < 1 hour / RTO < 1 hour | Usually yes |
| Cost-sensitive application | Usually no (significant cost increase) |
Multi-Region by Service
| Service | Multi-Region Capability |
|---|---|
| S3 | Cross-Region Replication (CRR) — async |
| RDS | Cross-Region read replicas — async |
| Aurora | Aurora Global Database — <1 second replication lag |
| DynamoDB | Global Tables — multi-active, multi-Region |
| CloudFront | Global by design (edge locations in 400+ cities) |
| Route 53 | Global DNS with health checks and failover routing |
| API Gateway | Regional or edge-optimized endpoints |
Aurora Global Database
| Feature | Detail |
|---|---|
| Replication | <1 second lag across Regions (physical-level replication) |
| Failover | Promote secondary Region in <1 minute |
| Read scaling | Up to 16 read replicas per secondary Region |
| Write | Single primary Region handles all writes |
| Managed failover | Switchover and failover supported for planned/unplanned events |
DynamoDB Global Tables
| Feature | Detail |
|---|---|
| Replication | Active-active across multiple Regions |
| Writes | Can write in ANY Region (multi-active) |
| Conflict resolution | Last writer wins |
| Latency | Replication typically <1 second |
| Use case | Global applications needing local read/write in each Region |
Route 53 Routing for Multi-Region
| Routing Policy | Use Case |
|---|---|
| Failover | Active-passive: primary Region fails → Route 53 sends traffic to secondary |
| Latency-based | Routes to the Region with lowest latency for the user |
| Geolocation | Routes based on user's geographic location |
| Geoproximity | Routes based on geographic distance with traffic biasing |
| Weighted | Split traffic by percentage (e.g., 90/10 for canary deployments) |
On the Exam: "Global users experience high latency" → Multi-Region with Route 53 latency-based routing + CloudFront. "The application must survive a Regional outage" → Multi-Region with Route 53 failover routing.
An application requires a database that can survive a full AWS Region outage with less than 1 second of data loss. Which solution meets this requirement?
A global e-commerce company needs a database that supports writes from multiple Regions simultaneously. Which service should they use?
Which Route 53 routing policy should you use to automatically redirect traffic to a healthy secondary Region when the primary Region fails?