Disaster Recovery Plan

Key Takeaways

  • The DRP restores IT systems, data, and infrastructure to meet the RTO and RPO that the BIA defined.
  • Site strategy (hot, warm, cold, mobile, cloud) is selected to match RTO cost trade-offs; shorter RTOs cost more.
  • RPO dictates backup/replication frequency; RTO dictates how fast the recovery process must complete.
  • DRP testing escalates from checklist and walkthrough to simulation, parallel, and full-interruption tests.
Last updated: June 2026

What the Disaster Recovery Plan Does

The Disaster Recovery Plan (DRP) is the technology-focused plan for restoring IT systems, applications, networks, and data after a disruptive event. It is the IT execution arm of the broader Business Continuity Plan and must satisfy the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) that the BIA established. If the BIA says the ERP system has a 4-hour RTO and a 30-minute RPO, the DRP must demonstrate it can rebuild that system within four hours using data no more than 30 minutes old.

The DRP documents recovery teams and their tasks, the priority order for restoring systems (driven by BIA criticality), the location and method of backups, the alternate processing site, and the step-by-step technical procedures. It also defines success criteria so a recovery can be declared complete and validated, not merely "the server is up."

Backup Strategy Driven by RPO

RPO sets how often data must be captured. The smaller the tolerable data loss, the more frequent and more expensive the protection.

StrategyTypical RPO it supportsRelative cost
Daily full backup to tapeUp to ~24 hours of lossLow
Periodic incremental + offsiteHoursModerate
Snapshots every few minutesMinutesHigher
Synchronous replicationNear zeroHighest

A classic exam pairing: an RPO measured in minutes cannot be met by nightly backups, replication or frequent snapshots are required. Conversely, paying for synchronous replication on a system with a 24-hour RPO wastes money the BIA does not justify.

Recovery Sites and DRP Testing

Matching Site Strategy to RTO

The recovery site is selected to meet the RTO at acceptable cost. There is a direct trade-off: faster recovery costs more.

  • Hot site: Live, fully configured duplicate; recovery in minutes to hours. Most expensive, for the shortest RTOs.
  • Warm site: Hardware and connectivity in place; software and data must be loaded. Moderate cost, hours to a day.
  • Cold site: Power, space, and HVAC only; everything else must be installed. Cheapest, days to recover.
  • Mobile site: Transportable, prefitted unit for field or regional recovery.
  • Cloud / DRaaS: On-demand failover with pay-as-you-go economics, increasingly common.
  • Reciprocal agreement: Mutual-aid pact with another organization; low cost but unreliable capacity and confidentiality risk.

Levels of DRP Testing

ISACA expects testing rigor to escalate. Know the order from least to most disruptive:

Test typeDescriptionDisruption
Checklist / desk reviewVerify plan completeness and resources on paperNone
Structured walkthrough / tabletopTeam talks through roles and scenariosNone
SimulationActing out a scenario without touching productionLow
ParallelRecovery systems run alongside production to validateModerate
Full interruptionProduction is failed over to recovery; highest realismHigh, risk to operations

Worked judgment: A new DRP has never been validated and the business cannot risk an outage. The appropriate first test is a structured walkthrough or simulation, not a full-interruption test, you build confidence before stressing production. Jumping straight to full interruption on an unproven plan is the trap answer. After each test, findings feed plan updates, mirroring the lessons-learned discipline of incident management.

Recovery Strategy Trade-offs and Common Exam Pitfalls

The DRP turns BIA numbers into engineering and spending decisions, and CISM tests whether candidates can match the strategy to the requirement without over- or under-investing.

Cost Versus Recovery Speed

The governing relationship is that shorter RTOs and RPOs cost more. A near-zero RPO requires synchronous replication and redundant storage; a near-zero RTO requires a live hot site or active-active architecture. The security manager's role is to ensure the chosen solution is proportional to documented business impact. Spending hot-site money on a system the BIA rated low criticality wastes budget; relying on nightly tape for a system with a 15-minute RPO guarantees a failed recovery. The defensible answer always traces back to the BIA.

Backups Are Not a Recovery Plan by Themselves

Having backups is necessary but not sufficient. ISACA expects:

  • Offsite or geographically separated copies so a single site disaster does not destroy both production and backups.
  • Regular restore testing, an untested backup is an assumption, not a recovery capability. Backups that cannot be restored are a frequent root cause of failed recoveries.
  • Protection of backups against ransomware, including immutable or air-gapped copies, since attackers target backups to force payment.
  • Documented recovery procedures, so restoration does not depend on one person's memory.

Frequent CISM Pitfalls

  1. Choosing a recovery site before the BIA exists, this is solving without requirements and is consistently the wrong answer.
  2. Confusing RTO and RPO when sizing the solution; RTO drives site choice, RPO drives backup frequency.
  3. Skipping straight to full-interruption testing on an unproven plan, risking the very outage the plan is meant to prevent.
  4. Treating the DRP as separate from the BCP, the DRP must support the continuity priorities the BCP sets, not run on its own logic.
  5. Forgetting people, recovery procedures assume trained staff are available; cross-training and documentation reduce key-person risk.

The synthesis CISM wants: the DRP is a business-driven, tested IT recovery capability sized to BIA objectives, with proven backups, an appropriately costed recovery site, and an escalating test program whose findings continuously improve the plan.

Test Your Knowledge

A critical application has a Recovery Time Objective (RTO) of 30 minutes. Which recovery site strategy is most appropriate?

A
B
C
D
Test Your Knowledge

An organization has never validated its newly written DRP and cannot tolerate any production outage. Which test should it run first?

A
B
C
D