16.4 Availability, Backup, and Recovery Controls

Key Takeaways

  • Availability controls support service commitments by keeping systems accessible for authorized use when needed, including during cyber events and disasters.
  • A business impact analysis identifies critical processes, dependencies, tolerable downtime, recovery priorities, and the resources needed to resume operations.
  • Recovery time objective measures how quickly service must resume; recovery point objective measures how much data loss is tolerable.
  • Full, incremental, and differential backups have different restore-speed, storage, and data-loss tradeoffs, so the best design depends on the recovery objectives.
  • Mirroring, replication, monitoring, tested restores, incident escalation, and documented procedures are far stronger evidence than an untested plan.
Last updated: June 2026

Availability as a Control Objective

Availability means information and systems are accessible for operation and use as committed or agreed. In ISC, availability is not merely an IT uptime statistic; it drives payroll processing, billing, cash receipts, inventory movement, reporting deadlines, and the availability category of SOC 2 engagements (one of the five trust services categories alongside security, confidentiality, processing integrity, and privacy).

A strong availability program includes business continuity planning (BCP), disaster recovery planning (DRP), backup and restoration procedures, monitoring, incident escalation, capacity management, and periodic testing. A BCP keeps the business operating (people, facilities, processes); a DRP restores the technology (systems, data, infrastructure). They overlap but are distinct documents, and the exam often forces you to pick which one a fact pattern describes.

Recovery sites are a frequent test topic. A hot site is a fully equipped, near-instant facility with current data ready for failover (supports the shortest RTO, highest cost). A warm site has hardware and connectivity but needs data restoration and configuration before use (moderate RTO and cost). A cold site is essentially empty space with power and connectivity, requiring days to equip (longest RTO, lowest cost). A mirrored/redundant site runs in parallel for immediate cutover. Match the site to the RTO: a payroll system that must recover in hours cannot rely on a cold site.

Cloud-based recovery and "disaster recovery as a service" increasingly replace physical alternate sites but follow the same RTO/cost logic.

Business Impact Analysis and Recovery Objectives

A business impact analysis (BIA) identifies critical processes and estimates the effect of disruption. It determines dependencies, peak processing periods, legal or contractual obligations, manual workarounds, recovery priorities, and resources required to resume operations. Two metrics drive recovery design and are routinely tested:

BIA outputMeaningExample
Critical processActivity that must recover quicklyPayroll direct-deposit processing
Recovery time objective (RTO)Maximum acceptable time to restore servicePayroll system restored within 4 hours
Recovery point objective (RPO)Maximum acceptable data loss, measured in timeNo more than 15 minutes of transactions lost
DependencyResource needed for recoveryBank-file interface, cloud database, VPN
WorkaroundTemporary procedure when system is downManual emergency-check process

The distinction is high-yield: RTO looks forward (how fast can we be running again?); RPO looks backward (how much recent data can we afford to lose?). A short RPO demands frequent data capture; a short RTO demands fast failover and tested restores.

Mirroring, Replication, and Backup Types

Mirroring maintains an exact or near-exact copy of data or systems, often for rapid failover. Replication copies data between locations and may be synchronous (near-real-time, supports a very short RPO) or asynchronous (some delay, lower cost). These tools cut downtime and data loss but will faithfully copy corrupted or unauthorized changes if recovery points and monitoring are weak, which is why backups remain necessary.

Backup types and their tradeoffs:

  • Full backup: copies all selected data; simplest restore, highest storage and time cost.
  • Incremental backup: copies changes since the last backup of any type; smallest and fastest to create, but restore needs the full backup plus every incremental in the chain.
  • Differential backup: copies changes since the last full backup; restore needs only the full backup plus the latest differential, but each differential grows until the next full.

Restore worked example. Full backup Sunday; differentials Mon-Fri. A failure Thursday morning restores from the Sunday full plus the Wednesday differential (the latest one). With incrementals instead, you would need Sunday plus Monday, Tuesday, and Wednesday.

Availability Metrics and SOC 2 Thinking

Common measures include agreed service time, downtime, uptime percentage, incident duration, mean time to detect (MTTD), and mean time to restore (MTTR). In a SOC 2 availability examination, a CPA evaluates whether controls are suitably designed (Type 1) and operating effectively over a period (Type 2) to meet availability commitments and system requirements.

Strong evidence includes approved BCP/DRP documents, backup schedules, successful restore-test logs, incident tickets, monitoring alerts, capacity reports, failover-test results, and documented remediation of failed backups. Weak evidence is management asserting that backups exist without proof that data can actually be restored, an untested plan, or a contact list that has never been exercised.

Exam Focus

Start from the business requirement. If downtime tolerance is low (short RTO), choose redundancy, failover, monitoring, and tested recovery. If data-loss tolerance is low (short RPO), choose frequent backups or replication plus restoration testing. If the fact pattern mentions failed backups, the strongest response is never "buy more storage"; it is to investigate the failures, remediate the root cause, and re-test the restore.

Availability also depends on protecting backups themselves. The widely cited 3-2-1 rule keeps three copies of data on two different media with at least one copy offsite (or offline/immutable). Offline or immutable copies matter against ransomware, which deliberately encrypts both production data and any reachable backups; replication alone would faithfully copy the encrypted files. Pair recovery design with redundancy components such as uninterruptible power supplies, redundant array of independent disks (RAID) storage, and clustered servers so that a single hardware failure does not cause an outage in the first place.

Prevention reduces how often the recovery plan must be invoked.

Test Your Knowledge

A payroll system has a recovery point objective of 15 minutes and a recovery time objective of 4 hours. Which control design best supports those objectives?

A
B
C
D
Test Your Knowledge

A company performs a full backup every Sunday and a differential backup every weeknight. A database fails Thursday morning before that night's backup runs. Which backups are generally needed to restore through Wednesday night?

A
B
C
D
Test Your Knowledge

During a SOC 2 availability examination, management states that nightly backups run but cannot produce any record of a successful restore. How should the CPA view this?

A
B
C
D