An organization requires recovery of a vital, customer-facing application within two hours of any disruption. Which alternate-site strategy is most appropriate?

Hot site. A hot site mirrors production with hardware, software, and near-current data, making it operational in minutes to hours and suitable for vital, critical applications with a low RTO. Cold and warm sites take longer to activate, and reciprocal agreements offer the least reliable readiness.

Core Workflows and Decision Points — Free Study Guide 2026

Key Takeaways

Incident management restores service fast (often with a workaround); problem management finds and removes the root cause to stop recurrence.
Changes flow through a Request for Change (RFC), assessment, approval, testing, scheduled implementation, and post-implementation review.
Configuration management maintains the CMDB of configuration items (CIs) so change impact can be assessed; release management deploys bundled changes to production.
Recovery-site cost rises with readiness: cold (cheapest, slowest) → warm → mobile → hot (most expensive, fastest); reciprocal arrangements are cheapest of all but least reliable.
RAID and clustering provide high availability against component failure but are not a substitute for off-site backups against site-level disasters.

Service Management Workflows (ITIL)

CISA leans on ITIL service-management vocabulary. The exam repeatedly tests the distinction between four processes that candidates blur together.

Incident management — An incident is an unplanned interruption or reduction in quality of an IT service. The goal is to restore service as quickly as possible, often with a temporary workaround. Speed matters more than root cause here.
Problem management — A problem is the underlying cause of one or more incidents. Problem management is proactive and analytical: it performs root-cause analysis to eliminate recurring incidents permanently. A known error with a documented workaround lives here.
Change management — A change is the addition, modification, or removal of anything that could affect IT services. Changes flow through a Request for Change (RFC), risk/impact assessment, authorization (often by a Change Advisory Board), testing, scheduled implementation, and a post-implementation review.
Configuration management — Maintains the Configuration Management Database (CMDB), a record of all configuration items (CIs) and their relationships, so change impact can be assessed accurately.

Release management packages approved changes into a controlled deployment to production while protecting the integrity of the existing environment.

Memory hook: Incident = restore now. Problem = find the cause. Change = control the modification. Configuration = know what you have. Release = deploy it safely.

Why the Distinctions Get Tested

The exam exploits the incident/problem boundary constantly. If a stem describes a server crash and asks the immediate priority, the answer is to restore service (incident management), even if the cause is unknown. If the stem says the same outage keeps happening and asks for the best long-term action, the answer is root-cause analysis (problem management). Watch the timing cue: now versus recurring.

For changes, emergency fixes still require a control: an emergency change must be authorized (sometimes after the fact by an emergency CAB) and documented, never left undocumented because it was urgent. The biggest operational red flag the exam loves is a developer or operator implementing a change directly in production without approval, testing, or segregation of duties — that is the wrong answer to defend.

Batch and Job Scheduling

Day-to-day operations also include job/batch scheduling. Auditors check for an automated scheduler with documented dependencies, exception alerts for failed jobs, restart/rerun procedures, and review of console/scheduler logs. Unmonitored failed batch jobs (for example, an overnight posting that silently aborts) are a classic operations finding.

Recovery-Site Decision Spectrum

When a stem asks which alternate processing site is most appropriate, map the RTO and cost tolerance in the stem to this spectrum:

Site	Equipment & data	Time to operational	Relative cost	Best for
Cold	Space, power, HVAC only; little/no hardware	Longest (days/weeks)	Lowest	Non-critical apps, long-term contracts
Warm	Hardware and connectivity present; software/data not fully current	Medium	Medium	Sensitive but not time-critical apps
Mobile	Pre-configured portable unit moved to the site	Variable	Medium	Regional outages, field deployment
Hot	Mirrors production; hardware, software, near-current data	Shortest (minutes/hours)	Highest	Vital, critical applications with low RTO
Reciprocal	Two organizations agree to host each other	Uncertain	Cheapest	Same-region offices; least reliable

The core trade-off: shorter RTO → hotter site → higher cost. A hot site is justified when recovery must be fast and the application is critical. A reciprocal (mutual aid) agreement is the least expensive but the least dependable, because capacity, configuration compatibility, and the partner's own availability during a regional event are all uncertain.

High-availability techniques sit alongside sites: RAID (redundant disks), server clustering, and failover protect against component failure and reduce RTO for localized faults. They are not a substitute for off-site backups, which protect against site-level disasters, ransomware, and corruption that replication would simply copy to the standby.

RAID and High-Availability Building Blocks

CISA expects you to recognize the common RAID (Redundant Array of Independent Disks) levels and what each buys you, because they appear in availability and storage-operations questions.

RAID level	Technique	Fault tolerance
RAID 0	Striping only	None — any disk failure loses all data (performance only)
RAID 1	Mirroring	Survives loss of one disk in a mirrored pair
RAID 5	Block striping + distributed parity	Survives one disk failure; needs ≥3 disks
RAID 6	Striping + double distributed parity	Survives two simultaneous failures; needs ≥4 disks
RAID 10	Mirrored stripes	High performance and fault tolerance; needs ≥4 disks

The trap is RAID 0: it improves throughput but provides no redundancy, so it never satisfies an availability requirement. Beyond RAID, server clustering lets a standby node take over automatically (failover), and redundant power, network paths, and load balancers remove single points of failure. All of these reduce RTO for component-level faults but, again, do not replace geographically separated backups.

The Change Lifecycle in Detail

Walk a normal change through its stages so the exam's process questions are automatic: RFC raised → categorized and risk-assessed → authorized (Change Advisory Board for significant changes) → built and tested in a non-production environment → scheduled and implemented with a back-out plan → post-implementation review → CMDB updated. A back-out (rollback) plan is the control auditors look for when a change asks to go to production; its absence is a finding. Emergency changes compress this flow but still require authorization and documentation, typically reviewed retroactively by an emergency CAB.

Test Your Knowledge

A critical application server fails during business hours and the cause is unknown. What is the immediate priority under ITIL?

Conduct root-cause analysis to identify the underlying defect

Restore service as quickly as possible, using a workaround if needed

Open a Request for Change to modify the server configuration

Update the CMDB to reflect the failed configuration item

Test Your Knowledge

The same network outage has recurred three times this month, each restored by a manual workaround. What is the best long-term action?

Continue applying the workaround whenever the outage recurs

Escalate each occurrence through incident management more quickly

Initiate problem management to perform root-cause analysis

Move the application to a hot site

CISA Study Guide

CISA

5.2 Core Workflows and Decision Points

Key Takeaways

Service Management Workflows (ITIL)

Why the Distinctions Get Tested

Batch and Job Scheduling

Recovery-Site Decision Spectrum

RAID and High-Availability Building Blocks

The Change Lifecycle in Detail

CISA Study Guide

1Chapter 1: CISA Orientation and Exam Strategy

2Chapter 2: Information System Auditing Process

3Chapter 3: Governance and Management of IT

4Chapter 4: Information Systems Acquisition, Development, and Implementation

5Chapter 5: Information Systems Operations and Business Resilience

6Chapter 6: Protection of Information Assets

7Chapter 7: Final Review and Test Day

CISA

5.2 Core Workflows and Decision Points

Key Takeaways

Service Management Workflows (ITIL)

Why the Distinctions Get Tested

Batch and Job Scheduling

Recovery-Site Decision Spectrum

RAID and High-Availability Building Blocks

The Change Lifecycle in Detail