5.1 Information Systems Operations and Business Resilience Overview

Key Takeaways

  • Domain 4 carries roughly 26% of the CISA exam, the largest single weighting, split between day-to-day IT operations and business resilience.
  • RTO is the maximum acceptable downtime to restore a process; RPO is the maximum acceptable data loss measured in time before the disruption.
  • MTD (Maximum Tolerable Downtime) equals RTO plus WRT (Work Recovery Time); SDO is the reduced service level run at the alternate site until normal operations resume.
  • The business impact analysis (BIA) drives recovery priorities and sets RTO, RPO, and MTD for each critical process.
  • An IS auditor evaluates whether operations and resilience controls are documented, tested, and aligned to business-defined recovery objectives.
Last updated: June 2026

What Domain 4 Covers

Information Systems Operations and Business Resilience is the largest CISA domain, weighted at approximately 26% of the exam. It joins two related ideas: keeping IT services running reliably every day (operations), and recovering them when something goes wrong (resilience). On exam day, expect more questions from this domain than any other, so a strong score here moves the overall result more than anywhere else.

The domain splits into two task areas. The first, IT operations management, covers job and batch scheduling, incident and problem management, change/release/configuration management, service-level management, performance and capacity monitoring, hardware and software asset management, and network, database, and storage operations. The second, business resilience, covers the business impact analysis (BIA), the business continuity plan (BCP), the disaster recovery plan (DRP), recovery objectives, alternate processing sites, high-availability design, and plan testing.

Quick Answer: Domain 4 (~26%) tests whether IT services run reliably and recover predictably. The auditor confirms operations are controlled and resilience plans are documented, owned, current, and tested against business-defined recovery objectives.

The Recovery Metrics You Must Know Cold

The exam tests four interlocking time metrics. Getting these exactly right earns several easy points, and confusing RTO with RPO is one of the most common ways candidates lose them.

MetricDefinitionMeasures
RPO (Recovery Point Objective)Maximum acceptable data loss, expressed in time before the disruptionHow far back you can lose data → backup frequency
RTO (Recovery Time Objective)Maximum acceptable downtime to restore a process or systemHow fast you must recover → recovery strategy cost
WRT (Work Recovery Time)Time to validate data and process the transaction backlog after systems are restoredPost-restore catch-up work
MTD (Maximum Tolerable Downtime)Total time a process can be down before causing severe or irreparable harmThe absolute outer limit (MTD = RTO + WRT)

A fifth metric, the Service Delivery Objective (SDO), is the reduced level of service the organization runs at the alternate site during recovery, before normal operations resume. SDO is tied to business need, not to a clock.

The relationships matter. A lower RTO (faster recovery requirement) generally demands a more expensive recovery strategy, such as a hot site. A lower RPO (less tolerable data loss) demands more frequent backups or replication. RPO looks backward from the incident (data); RTO looks forward from the incident (time to restore).

The BIA Drives Everything

The business impact analysis (BIA) is the foundation of resilience and the most-tested starting point in the domain. The BIA identifies critical business processes, the resources they depend on, and the financial and operational impact of losing them over time. From the BIA, the organization sets each process's RTO, RPO, and MTD, then prioritizes recovery so the most critical processes are restored first.

Key BIA principles the exam expects:

  • The business owns criticality decisions; IT supports them. Recovery priorities come from business impact, not from which system is technically easiest to restore.
  • Criticality is time-based: a process may be tolerable for two hours but catastrophic at twelve. The BIA captures impact as it grows over the outage window.
  • The BIA precedes strategy selection. You cannot rationally choose a hot vs. cold site, or a backup frequency, until the BIA has set RTO and RPO.

The Auditor's Lens

Across this domain, an IS auditor does not run operations or recovery; the auditor provides assurance that controls exist, are owned, and work. The recurring audit questions are: Are objectives derived from a current BIA? Is the plan documented and approved? Is it kept current as the environment changes? And, most importantly, is it tested, with results acted on? An untested plan offers little assurance no matter how detailed it looks.

Where Operations and Resilience Meet

The two halves of Domain 4 are not separate worlds; strong daily operations are what make resilience achievable. A clean change and configuration record means the recovery team knows exactly what to rebuild. Accurate asset management — knowing every hardware and software item, its version, and its license — means the alternate site can be provisioned with matching components. Capacity and performance monitoring spots the saturation that often precedes an outage, turning a would-be disaster into a managed incident.

Several operational disciplines feed resilience directly:

  • Network, database, and storage operations — replication, log shipping, and clustering are configured here and become the mechanisms that actually meet the RPO and RTO.
  • Hardware and software asset management — an accurate inventory and license position is what lets the recovery site be stood up legally and correctly.
  • End-user computing (EUC) — spreadsheets and shadow databases built outside IT often hold critical data with no backup or change control, creating a resilience gap the formal DRP never sees.

For the exam, remember that the auditor evaluates the whole chain: a brilliant DRP cannot recover a system whose configuration is undocumented or whose data lives in an ungoverned spreadsheet. Resilience is only as strong as the operational hygiene underneath it, which is precisely why ISACA folds both into one heavily weighted domain. When a stem describes a recovery failure, look upstream — the root cause is frequently a weak operational control such as an unmaintained CMDB, a missing off-site backup, or an asset list that no longer matches production.

This integrated view is also why Domain 4 questions so often reward the answer that strengthens the underlying control rather than the one that simply reacts to the symptom.

Test Your Knowledge

Which metric defines the maximum acceptable amount of data loss, measured in time, that an organization can tolerate?

A
B
C
D
Test Your Knowledge

An organization sets RTO at 4 hours and WRT at 2 hours for a critical process. What is the Maximum Tolerable Downtime (MTD)?

A
B
C
D
Test Your Knowledge

Before selecting between a hot site and a cold site, which activity should the organization complete first?

A
B
C
D