2.4 Disaster Recovery Strategies — RTO and RPO

Key Takeaways

RPO (Recovery Point Objective) defines how much data loss is acceptable; RTO (Recovery Time Objective) defines how quickly the system must be restored.
The four DR strategies from cheapest to fastest recovery are: Backup & Restore, Pilot Light, Warm Standby, and Multi-Site Active-Active.
Backup & Restore has the highest RTO/RPO (hours) but lowest cost; Multi-Site Active-Active has near-zero RTO/RPO but highest cost.
Pilot Light keeps core systems (database) running in the DR Region but requires scaling up compute during failover.
Warm Standby runs a scaled-down version of the full environment in the DR Region, reducing failover time to minutes.

Last updated: March 2026

Disaster Recovery Strategies — RTO and RPO

Quick Answer: RPO = max acceptable data loss (time). RTO = max acceptable downtime (time). Four DR strategies: Backup & Restore (cheapest, hours RTO), Pilot Light (core running, minutes-hours RTO), Warm Standby (scaled-down environment, minutes RTO), Multi-Site Active-Active (most expensive, near-zero RTO).

RPO and RTO Defined

Metric	Definition	Example
RPO (Recovery Point Objective)	Maximum amount of data you can afford to lose, measured in time	RPO of 1 hour = you can lose up to 1 hour of data
RTO (Recovery Time Objective)	Maximum time your system can be down after a disaster	RTO of 4 hours = system must be back online within 4 hours

The RPO/RTO Trade-off

Lower RPO/RTO = more expensive and complex to achieve.

The Four DR Strategies

1. Backup and Restore

Concept: Regularly back up data to another Region. In a disaster, restore from backups and rebuild the environment.

Aspect	Detail
RTO	Hours (time to restore + rebuild)
RPO	Hours (depends on backup frequency)
Cost	Lowest (pay only for backup storage)
Complexity	Lowest
AWS Services	AWS Backup, S3 Cross-Region Replication, EBS Snapshots, RDS Automated Backups

Best for: Non-critical workloads, development environments, workloads where hours of downtime are acceptable.

2. Pilot Light

Concept: Keep the core of your system (typically the database) running in the DR Region at all times. Other components (compute, app servers) are provisioned only during failover.

Aspect	Detail
RTO	Minutes to hours (time to scale up compute)
RPO	Minutes (near-real-time data replication)
Cost	Low-Medium (database running, compute off)
Complexity	Medium
AWS Services	RDS Cross-Region replica, Aurora Global DB, pre-configured AMIs, CloudFormation templates

Best for: Workloads that can tolerate some downtime but need minimal data loss.

3. Warm Standby

Concept: Run a scaled-down but fully functional version of the production environment in the DR Region at all times. Scale up to full production capacity during failover.

Aspect	Detail
RTO	Minutes (scale up existing resources)
RPO	Seconds to minutes (continuous replication)
Cost	Medium-High (scaled-down environment always running)
Complexity	Medium-High
AWS Services	Auto Scaling, Route 53 failover, Aurora Global DB, reduced-size EC2 instances

Best for: Business-critical applications that need rapid recovery.

4. Multi-Site Active-Active

Concept: Full production environment running in two or more Regions simultaneously. Traffic is served from all Regions at all times.

Aspect	Detail
RTO	Near-zero (traffic automatically shifts)
RPO	Near-zero (synchronous or near-synchronous replication)
Cost	Highest (full production in multiple Regions)
Complexity	Highest
AWS Services	DynamoDB Global Tables, Aurora Global DB, Route 53 latency/weighted routing, CloudFront

Best for: Mission-critical, zero-downtime applications (financial services, healthcare, global e-commerce).

Strategy Comparison

Strategy	RTO	RPO	Cost	Complexity
Backup & Restore	Hours	Hours	$	Low
Pilot Light	Mins-Hours	Minutes	$$	Medium
Warm Standby	Minutes	Seconds-Mins	$$$	Medium-High
Multi-Site	Near-zero	Near-zero	$$$$	High

AWS Backup

AWS Backup is a centralized service to automate and manage backups across AWS services.

Feature	Detail
Supported services	EC2, EBS, RDS, Aurora, DynamoDB, EFS, FSx, S3, and more
Backup plans	Define frequency, retention, lifecycle (cold → delete)
Cross-Region	Copy backups to another Region for DR
Cross-account	Copy backups to another account for isolation
Vault Lock	WORM (Write Once Read Many) — prevents backup deletion (compliance)
Point-in-time	Continuous backups for supported services (e.g., RDS, DynamoDB)

On the Exam: "Centralized backup management across multiple services and accounts" → AWS Backup. "Prevent backup deletion for compliance" → AWS Backup Vault Lock.

Test Your Knowledge

A company requires an RTO of 15 minutes and an RPO of 1 minute for their critical application. Which DR strategy should they implement?

Backup and Restore

Pilot Light

Warm Standby

Multi-Site Active-Active

Test Your Knowledge

In a Pilot Light disaster recovery strategy, which components typically run continuously in the DR Region?

Full-scale web servers, application servers, and databases

Only the database (core data layer)

Scaled-down versions of all application tiers

Nothing — everything is restored from backups

Test Your KnowledgeOrdering

Order the disaster recovery strategies from LOWEST cost to HIGHEST cost:

Arrange the items in the correct order

Warm Standby

Multi-Site Active-Active

Backup and Restore

Pilot Light

Up Next

2.5 Decoupling with SQS, SNS, and EventBridge

Continue learning

AWS Solutions Architect Associate

1Introduction

2Domain 1: Design Secure Architectures (30%)

3Domain 2: Design Resilient Architectures (26%)

4Domain 3: Design High-Performing Architectures (24%)

5Domain 4: Design Cost-Optimized Architectures (20%)

6VPC and Networking Deep Dive

7Migration, Transfer, and Hybrid Services

8Serverless Architecture and Application Services

9Advanced Topics and Exam Scenarios