2.4 Disaster Recovery Strategies — RTO and RPO

Key Takeaways

  • RPO (Recovery Point Objective) defines how much data loss is acceptable; RTO (Recovery Time Objective) defines how quickly the system must be restored.
  • The four DR strategies from cheapest to fastest recovery are: Backup & Restore, Pilot Light, Warm Standby, and Multi-Site Active-Active.
  • Backup & Restore has the highest RTO/RPO (hours) but lowest cost; Multi-Site Active-Active has near-zero RTO/RPO but highest cost.
  • Pilot Light keeps core systems (database) running in the DR Region but requires scaling up compute during failover.
  • Warm Standby runs a scaled-down version of the full environment in the DR Region, reducing failover time to minutes.
Last updated: March 2026

Disaster Recovery Strategies — RTO and RPO

Quick Answer: RPO = max acceptable data loss (time). RTO = max acceptable downtime (time). Four DR strategies: Backup & Restore (cheapest, hours RTO), Pilot Light (core running, minutes-hours RTO), Warm Standby (scaled-down environment, minutes RTO), Multi-Site Active-Active (most expensive, near-zero RTO).

RPO and RTO Defined

MetricDefinitionExample
RPO (Recovery Point Objective)Maximum amount of data you can afford to lose, measured in timeRPO of 1 hour = you can lose up to 1 hour of data
RTO (Recovery Time Objective)Maximum time your system can be down after a disasterRTO of 4 hours = system must be back online within 4 hours

The RPO/RTO Trade-off

Lower RPO/RTO = more expensive and complex to achieve.

The Four DR Strategies

1. Backup and Restore

Concept: Regularly back up data to another Region. In a disaster, restore from backups and rebuild the environment.

AspectDetail
RTOHours (time to restore + rebuild)
RPOHours (depends on backup frequency)
CostLowest (pay only for backup storage)
ComplexityLowest
AWS ServicesAWS Backup, S3 Cross-Region Replication, EBS Snapshots, RDS Automated Backups

Best for: Non-critical workloads, development environments, workloads where hours of downtime are acceptable.

2. Pilot Light

Concept: Keep the core of your system (typically the database) running in the DR Region at all times. Other components (compute, app servers) are provisioned only during failover.

AspectDetail
RTOMinutes to hours (time to scale up compute)
RPOMinutes (near-real-time data replication)
CostLow-Medium (database running, compute off)
ComplexityMedium
AWS ServicesRDS Cross-Region replica, Aurora Global DB, pre-configured AMIs, CloudFormation templates

Best for: Workloads that can tolerate some downtime but need minimal data loss.

3. Warm Standby

Concept: Run a scaled-down but fully functional version of the production environment in the DR Region at all times. Scale up to full production capacity during failover.

AspectDetail
RTOMinutes (scale up existing resources)
RPOSeconds to minutes (continuous replication)
CostMedium-High (scaled-down environment always running)
ComplexityMedium-High
AWS ServicesAuto Scaling, Route 53 failover, Aurora Global DB, reduced-size EC2 instances

Best for: Business-critical applications that need rapid recovery.

4. Multi-Site Active-Active

Concept: Full production environment running in two or more Regions simultaneously. Traffic is served from all Regions at all times.

AspectDetail
RTONear-zero (traffic automatically shifts)
RPONear-zero (synchronous or near-synchronous replication)
CostHighest (full production in multiple Regions)
ComplexityHighest
AWS ServicesDynamoDB Global Tables, Aurora Global DB, Route 53 latency/weighted routing, CloudFront

Best for: Mission-critical, zero-downtime applications (financial services, healthcare, global e-commerce).

Strategy Comparison

StrategyRTORPOCostComplexity
Backup & RestoreHoursHours$Low
Pilot LightMins-HoursMinutes$$Medium
Warm StandbyMinutesSeconds-Mins$$$Medium-High
Multi-SiteNear-zeroNear-zero$$$$High

AWS Backup

AWS Backup is a centralized service to automate and manage backups across AWS services.

FeatureDetail
Supported servicesEC2, EBS, RDS, Aurora, DynamoDB, EFS, FSx, S3, and more
Backup plansDefine frequency, retention, lifecycle (cold → delete)
Cross-RegionCopy backups to another Region for DR
Cross-accountCopy backups to another account for isolation
Vault LockWORM (Write Once Read Many) — prevents backup deletion (compliance)
Point-in-timeContinuous backups for supported services (e.g., RDS, DynamoDB)

On the Exam: "Centralized backup management across multiple services and accounts" → AWS Backup. "Prevent backup deletion for compliance" → AWS Backup Vault Lock.

Test Your Knowledge

A company requires an RTO of 15 minutes and an RPO of 1 minute for their critical application. Which DR strategy should they implement?

A
B
C
D
Test Your Knowledge

In a Pilot Light disaster recovery strategy, which components typically run continuously in the DR Region?

A
B
C
D
Test Your KnowledgeOrdering

Order the disaster recovery strategies from LOWEST cost to HIGHEST cost:

Arrange the items in the correct order

1
Warm Standby
2
Multi-Site Active-Active
3
Backup and Restore
4
Pilot Light