2.4 Disaster Recovery Strategies — RTO and RPO

Key Takeaways

  • RPO (Recovery Point Objective) is the maximum acceptable data loss measured in time; RTO (Recovery Time Objective) is the maximum acceptable downtime measured in time.
  • The four AWS DR strategies, cheapest/slowest to costliest/fastest, are Backup & Restore, Pilot Light, Warm Standby, and Multi-Site Active-Active.
  • Pilot Light keeps the data layer running in the DR Region but provisions compute only on failover; Warm Standby runs a scaled-down full stack ready to scale up.
  • Multi-Site Active-Active serves live traffic from two or more Regions for near-zero RTO/RPO at the highest cost and complexity.
  • AWS Backup centralizes backups across services and accounts, supports cross-Region/cross-account copy, and offers Vault Lock (WORM) to block deletion for compliance.
Last updated: June 2026

Quick Answer: RPO = how much data (in time) you can lose; RTO = how long (in time) you can be down. Match them to four strategies: Backup & Restore (cheapest, hours), Pilot Light (data layer hot, minutes–hours), Warm Standby (scaled-down stack, minutes), Multi-Site Active-Active (live everywhere, near-zero). Tighter objectives cost more.

Defining RPO and RTO

These two terms anchor almost every DR exam question. Read the scenario, extract the numbers, then pick the cheapest strategy that still meets both.

MetricDefinitionExample
RPO (Recovery Point Objective)Max data loss tolerated, in timeRPO 1 hour → may lose up to 1 hour of writes
RTO (Recovery Time Objective)Max downtime tolerated, in timeRTO 4 hours → must be back within 4 hours

Lower RPO/RTO always means more standing infrastructure and higher cost — that trade-off is the heart of the domain. A backup-only design can have an RPO of hours; an active-active design approaches zero.

The Four Strategies

1. Backup & Restore

Regularly back up data (and AMIs/templates) to another Region; on disaster, restore and rebuild. RTO hours, RPO hours, lowest cost. Services: AWS Backup, S3 Cross-Region Replication, EBS snapshots, RDS automated backups. Best for dev/test and tolerant workloads.

2. Pilot Light

Keep only the core data layer (database) replicating and running in the DR Region; compute is off until failover, then provisioned from pre-baked AMIs or CloudFormation. RTO minutes–hours, RPO minutes, low-medium cost.

3. Warm Standby

Run a scaled-down but fully functional copy of production in the DR Region at all times. On disaster you scale it up to full size and shift traffic. RTO minutes, RPO seconds–minutes, medium-high cost. Services: Auto Scaling, Aurora Global Database, Route 53 failover. Best for business-critical apps needing rapid recovery.

4. Multi-Site Active-Active

Full production runs in two or more Regions simultaneously, all serving live traffic. On failure, traffic simply shifts. RTO and RPO near-zero, highest cost and complexity. Services: DynamoDB Global Tables, Aurora Global Database, Route 53 latency/weighted routing, CloudFront. Best for financial, healthcare, and global commerce.

Strategy Comparison

StrategyRTORPOCostWhat runs in DR
Backup & RestoreHoursHours$Nothing (just backups)
Pilot LightMins–HoursMinutes$$Data layer only
Warm StandbyMinutesSecs–Mins$$$Scaled-down full stack
Multi-SiteNear-zeroNear-zero$$$$Full production

A reliable shortcut: "database always on, compute off" = Pilot Light; "small version of everything always running" = Warm Standby; "both Regions live" = Multi-Site.

AWS Backup

AWS Backup centralizes and automates backups across services so you do not script each one separately.

FeatureDetail
Supported servicesEC2, EBS, RDS, Aurora, DynamoDB, EFS, FSx, S3, Storage Gateway
Backup plansFrequency, retention, lifecycle to cold storage
Cross-Region copyReplicate backups to a DR Region
Cross-account copyIsolate backups in a separate account
Vault LockWORM (Write Once Read Many) — blocks deletion for compliance

On the Exam: "Centralized backup across many services/accounts" → AWS Backup. "Prevent anyone, even admins, from deleting backups for a retention period" → AWS Backup Vault Lock in compliance mode. "Lowest-cost DR that tolerates hours of downtime" → Backup & Restore.

Building Blocks That Drive RPO and RTO

Your achievable RPO is set by how often data is captured or replicated, and your RTO by how fast you can stand the stack back up. Knowing which AWS feature delivers which window is what the exam tests.

Data-protection featureRPO it enables
RDS automated backups + nightly snapshotsHours (to last snapshot)
RDS / DynamoDB point-in-time recovery (PITR)~5 minutes
Aurora Global Database replicationSub-second
DynamoDB Global Tables~Sub-second (active-active)
S3 Cross-Region ReplicationMinutes (async)

To shrink RTO, pre-bake everything: store AMIs and CloudFormation/CDK templates so the DR environment rebuilds with one deploy, keep AMIs copied to the DR Region, and automate DNS cutover with Route 53 health checks + failover. Infrastructure as code is the difference between an hours-long manual rebuild and a minutes-long automated one, so a Backup & Restore design with templated infrastructure can beat a sloppy Pilot Light.

Testing, Pitfalls, and AWS Elastic Disaster Recovery

  • Test the plan. A DR strategy that is never failover-tested is assumed broken; the exam favors answers that include regular game-day failover drills and automated runbooks.
  • AWS Elastic Disaster Recovery (DRS) continuously replicates on-premises or cross-Region servers at the block level into a low-cost staging area and launches full instances on demand — an exam answer for low-RPO recovery of existing servers (including lift-and-shift from on-prem) without re-architecting.
  • Don't over-buy. If a scenario tolerates hours of downtime, Multi-Site Active-Active is the wrong (over-engineered, over-cost) answer; pick the cheapest tier that meets both RPO and RTO. Conversely, never answer Backup & Restore when the scenario states near-zero downtime.
  • Cross-account isolation (a separate backup account, often with AWS Backup cross-account copy and SCP guardrails) protects backups from ransomware or a compromised production account — choose it when the threat is malicious deletion rather than infrastructure failure.

In short, decode the two numbers, map each to the feature that achieves it, automate the rebuild, and select the lowest-cost strategy that still clears both bars.

Test Your Knowledge

A business-critical application requires an RTO of about 10 minutes and an RPO of seconds, but the team wants to avoid the cost of running full production in two Regions. Which strategy fits best?

A
B
C
D
Test Your Knowledge

In a Pilot Light strategy, which components are kept running continuously in the DR Region?

A
B
C
D
Test Your Knowledge

A compliance mandate requires that backups cannot be deleted or altered by anyone, including administrators, for the full retention period. Which AWS Backup feature satisfies this?

A
B
C
D