2.3 Multi-AZ and Multi-Region Architectures
Key Takeaways
- Multi-AZ protects against the failure of a single Availability Zone within one Region; Multi-Region protects against an entire Region outage and serves global users with low latency.
- RDS Multi-AZ (single-standby) uses synchronous replication with 60–120s failover; RDS Multi-AZ DB cluster (two readable standbys) failover is ~35s; Aurora stores data 6 ways across 3 AZs.
- Aurora Global Database replicates cross-Region with sub-second lag and promotes a secondary in under a minute; DynamoDB Global Tables are active-active multi-Region with last-writer-wins.
- S3 and DynamoDB automatically replicate across at least three AZs in a Region; cross-Region replication (S3 CRR, RDS cross-Region replica) must be explicitly configured and is asynchronous.
- Route 53 failover, latency-based, geolocation, geoproximity, and weighted routing policies steer traffic across Regions using health checks.
Quick Answer: Multi-AZ = high availability inside one Region (survives an AZ outage). Multi-Region = disaster recovery plus global performance (survives a Region outage). Choose Multi-AZ as the production baseline; add Multi-Region when RPO/RTO or data-sovereignty requirements demand it.
Multi-AZ: High Availability Within a Region
An Availability Zone (AZ) is one or more discrete data centers with independent power, cooling, and networking. Spreading resources across two or more AZs means a single-AZ outage does not take down the application. The exam expects you to know each service's built-in behavior.
| Service | Multi-AZ behavior |
|---|---|
| EC2 + ALB | Place instances in 2+ AZs; ALB routes around the failed AZ |
| RDS Multi-AZ | Synchronous standby in another AZ; automatic failover 60–120s |
| Aurora | Data replicated 6 ways across 3 AZs; up to 15 read replicas |
| ElastiCache (Redis) | Multi-AZ with automatic failover (Memcached has none) |
| S3 / DynamoDB / EFS | Automatically span 3+ AZs in the Region |
| NAT Gateway | One per AZ for true Multi-AZ resilience |
RDS Failover Detail
| Multi-AZ instance | Multi-AZ DB cluster | |
|---|---|---|
| Standbys | 1 (not readable) | 2 (readable) |
| Replication | Synchronous | Semi-synchronous |
| Failover | 60–120 seconds | ~35 seconds |
| Read scaling | No | Yes |
On the Exam: "Automatic database failover with minimal downtime" → RDS Multi-AZ. "Failover and read scaling" → Aurora or the RDS Multi-AZ DB cluster.
Multi-Region: Disaster Recovery and Global Reach
Multi-Region is needed when you must survive a full Region outage, serve sub-100ms latency to users on other continents, or keep data inside a country for data sovereignty. It roughly doubles infrastructure cost, so it is not the default.
| Service | Multi-Region capability |
|---|---|
| S3 | Cross-Region Replication (CRR), asynchronous |
| RDS | Cross-Region read replica, asynchronous |
| Aurora | Global Database — sub-second cross-Region lag |
| DynamoDB | Global Tables — active-active, multi-Region writes |
| CloudFront | Global edge network by design |
| Route 53 | Global DNS with health checks and failover |
Aurora Global Database vs. DynamoDB Global Tables
- Aurora Global Database: one primary Region for writes, up to five secondary read-only Regions, sub-second replication lag, and promotion of a secondary in under a minute during a Regional disaster. Best when you need a relational engine with a tiny RPO.
- DynamoDB Global Tables: active-active — every Region accepts reads and writes, replicated typically in under a second with last-writer-wins conflict resolution. Best for globally distributed apps needing local low-latency writes everywhere.
Route 53 Routing Policies
| Policy | Use case |
|---|---|
| Failover | Active-passive: send to secondary when primary health check fails |
| Latency-based | Route each user to the lowest-latency Region |
| Geolocation | Route by the user's continent/country (sovereignty, licensing) |
| Geoproximity | Route by distance with adjustable traffic bias |
| Weighted | Split by percentage (e.g., 90/10 canary) |
On the Exam: "Global users see high latency" → latency-based routing + CloudFront. "Application must survive a Region outage" → failover routing with health checks to a standby Region. Pair Route 53 health checks with CloudWatch alarms to trigger automatic DNS failover. Remember S3 CRR and RDS cross-Region replicas are asynchronous, so they carry a larger RPO than Aurora Global Database's sub-second lag.
Edge, Global Acceleration, and Subtle Distinctions
Two global services round out the Multi-Region picture. Amazon CloudFront caches content at hundreds of edge locations and is the answer for serving static and dynamic content with low latency worldwide; it also shields the origin and integrates with AWS WAF and Shield. AWS Global Accelerator gives you two static anycast IP addresses and routes user traffic over the AWS backbone to the nearest healthy Regional endpoint, with near-instant failover — ideal for non-HTTP, gaming, or VoIP workloads needing fast Regional failover, where CloudFront (an HTTP cache) does not fit.
| Need | Service |
|---|---|
| Cache HTTP content globally | CloudFront |
| Static anycast IPs + fast Regional failover for TCP/UDP | Global Accelerator |
| DNS-level Region routing | Route 53 |
High-Stakes Exam Distinctions
- Multi-AZ vs. read replica: Multi-AZ is for availability (synchronous standby, automatic failover, standby not readable on single-instance Multi-AZ). A read replica is for read scaling (asynchronous, readable, no automatic promotion unless you build it). Do not confuse them — a question asking for failover wants Multi-AZ; one asking to offload read queries wants a read replica.
- S3 durability and AZ spread: S3 stores objects redundantly across at least three AZs in a Region (eleven nines of durability), but that is not Multi-Region — surviving a Region outage still requires Cross-Region Replication.
- Single point of failure hunting: a single NAT Gateway, a single-AZ subnet, or an EC2 instance with no ASG are classic SPOFs the exam plants; the fix is to spread across AZs and front with an ELB/ASG.
- Sovereignty: when data must remain in a country, use Route 53 geolocation routing plus Regional resources in that country, never a global active-active that could serve data from elsewhere.
The mental model for the whole section: AZ failure is handled inside a Region with Multi-AZ; Region failure and global latency are handled across Regions with replication (Aurora Global Database, DynamoDB Global Tables, S3 CRR) and traffic steering (Route 53, CloudFront, Global Accelerator).
A relational workload must survive a complete AWS Region outage with less than one second of potential data loss and recover in under a minute. Which solution fits best?
A global application must accept low-latency writes in North America, Europe, and Asia simultaneously, with each Region serving its local users. Which database meets this requirement?
An active-passive deployment must automatically send users to a standby Region only when the primary Region's endpoint becomes unhealthy. Which Route 53 routing policy should be used?