2.3 Multi-AZ and Multi-Region Architectures

Key Takeaways

  • Multi-AZ deploys resources across Availability Zones within a single Region for high availability against AZ-level failures.
  • Multi-Region deploys resources across AWS Regions for disaster recovery, low latency globally, and compliance with data sovereignty requirements.
  • RDS Multi-AZ provides automatic failover with a synchronous standby replica; Aurora Multi-AZ has up to 15 read replicas with automatic failover.
  • S3 replicates data across at least 3 AZs automatically; Cross-Region Replication requires explicit configuration.
  • Route 53 health checks and failover routing enable automatic traffic redirection during Regional outages.
Last updated: March 2026

Multi-AZ and Multi-Region Architectures

Quick Answer: Multi-AZ = high availability within a Region (protects against AZ failure). Multi-Region = disaster recovery + global performance (protects against Region failure). Most production workloads need Multi-AZ at minimum. Use Multi-Region for mission-critical applications with RPO/RTO requirements.

Multi-AZ Architecture

What Multi-AZ Means

Deploying resources across multiple Availability Zones within a single Region ensures that if one AZ experiences an outage, your application continues running in other AZs.

Multi-AZ by Service

ServiceMulti-AZ Behavior
EC2 + ALBDeploy instances across 2+ AZs; ALB routes to healthy instances
RDS Multi-AZSynchronous standby replica in another AZ; automatic failover (60-120 seconds)
AuroraData replicated 6 ways across 3 AZs; up to 15 read replicas with auto-failover
ElastiCacheMulti-AZ with automatic failover for Redis (not Memcached clusters)
EFSAutomatically stores data across multiple AZs
S3Automatically replicates across 3+ AZs within a Region
DynamoDBAutomatically replicates across 3 AZs within a Region
NAT GatewayDeploy one per AZ for true Multi-AZ resilience

RDS Multi-AZ Deep Dive

FeatureRDS Multi-AZ InstanceRDS Multi-AZ Cluster
Replicas1 standby (not readable)2 readable standbys
ReplicationSynchronousSemi-synchronous
Failover time60-120 seconds~35 seconds
Read scalingNo (standby not readable)Yes (standbys handle reads)
EnginesAll RDS enginesMySQL, PostgreSQL

On the Exam: "The database must have automatic failover with minimal downtime" → RDS Multi-AZ. "The database must support read scaling AND automatic failover" → Aurora or RDS Multi-AZ Cluster.

Multi-Region Architecture

When to Use Multi-Region

RequirementMulti-Region Needed?
Survive a full AWS Region outageYes
Sub-100ms latency for global usersYes
Data sovereignty (data must stay in country)Yes
RPO < 1 hour / RTO < 1 hourUsually yes
Cost-sensitive applicationUsually no (significant cost increase)

Multi-Region by Service

ServiceMulti-Region Capability
S3Cross-Region Replication (CRR) — async
RDSCross-Region read replicas — async
AuroraAurora Global Database — <1 second replication lag
DynamoDBGlobal Tables — multi-active, multi-Region
CloudFrontGlobal by design (edge locations in 400+ cities)
Route 53Global DNS with health checks and failover routing
API GatewayRegional or edge-optimized endpoints

Aurora Global Database

FeatureDetail
Replication<1 second lag across Regions (physical-level replication)
FailoverPromote secondary Region in <1 minute
Read scalingUp to 16 read replicas per secondary Region
WriteSingle primary Region handles all writes
Managed failoverSwitchover and failover supported for planned/unplanned events

DynamoDB Global Tables

FeatureDetail
ReplicationActive-active across multiple Regions
WritesCan write in ANY Region (multi-active)
Conflict resolutionLast writer wins
LatencyReplication typically <1 second
Use caseGlobal applications needing local read/write in each Region

Route 53 Routing for Multi-Region

Routing PolicyUse Case
FailoverActive-passive: primary Region fails → Route 53 sends traffic to secondary
Latency-basedRoutes to the Region with lowest latency for the user
GeolocationRoutes based on user's geographic location
GeoproximityRoutes based on geographic distance with traffic biasing
WeightedSplit traffic by percentage (e.g., 90/10 for canary deployments)

On the Exam: "Global users experience high latency" → Multi-Region with Route 53 latency-based routing + CloudFront. "The application must survive a Regional outage" → Multi-Region with Route 53 failover routing.

Test Your Knowledge

An application requires a database that can survive a full AWS Region outage with less than 1 second of data loss. Which solution meets this requirement?

A
B
C
D
Test Your Knowledge

A global e-commerce company needs a database that supports writes from multiple Regions simultaneously. Which service should they use?

A
B
C
D
Test Your Knowledge

Which Route 53 routing policy should you use to automatically redirect traffic to a healthy secondary Region when the primary Region fails?

A
B
C
D