5.2 Core Workflows and Decision Points

Key Takeaways

  • Incident management restores service fast (often with a workaround); problem management finds and removes the root cause to stop recurrence.
  • Changes flow through a Request for Change (RFC), assessment, approval, testing, scheduled implementation, and post-implementation review.
  • Configuration management maintains the CMDB of configuration items (CIs) so change impact can be assessed; release management deploys bundled changes to production.
  • Recovery-site cost rises with readiness: cold (cheapest, slowest) → warm → mobile → hot (most expensive, fastest); reciprocal arrangements are cheapest of all but least reliable.
  • RAID and clustering provide high availability against component failure but are not a substitute for off-site backups against site-level disasters.
Last updated: June 2026

Service Management Workflows (ITIL)

CISA leans on ITIL service-management vocabulary. The exam repeatedly tests the distinction between four processes that candidates blur together.

  • Incident management — An incident is an unplanned interruption or reduction in quality of an IT service. The goal is to restore service as quickly as possible, often with a temporary workaround. Speed matters more than root cause here.
  • Problem management — A problem is the underlying cause of one or more incidents. Problem management is proactive and analytical: it performs root-cause analysis to eliminate recurring incidents permanently. A known error with a documented workaround lives here.
  • Change management — A change is the addition, modification, or removal of anything that could affect IT services. Changes flow through a Request for Change (RFC), risk/impact assessment, authorization (often by a Change Advisory Board), testing, scheduled implementation, and a post-implementation review.
  • Configuration management — Maintains the Configuration Management Database (CMDB), a record of all configuration items (CIs) and their relationships, so change impact can be assessed accurately.

Release management packages approved changes into a controlled deployment to production while protecting the integrity of the existing environment.

Memory hook: Incident = restore now. Problem = find the cause. Change = control the modification. Configuration = know what you have. Release = deploy it safely.

Why the Distinctions Get Tested

The exam exploits the incident/problem boundary constantly. If a stem describes a server crash and asks the immediate priority, the answer is to restore service (incident management), even if the cause is unknown. If the stem says the same outage keeps happening and asks for the best long-term action, the answer is root-cause analysis (problem management). Watch the timing cue: now versus recurring.

For changes, emergency fixes still require a control: an emergency change must be authorized (sometimes after the fact by an emergency CAB) and documented, never left undocumented because it was urgent. The biggest operational red flag the exam loves is a developer or operator implementing a change directly in production without approval, testing, or segregation of duties — that is the wrong answer to defend.

Batch and Job Scheduling

Day-to-day operations also include job/batch scheduling. Auditors check for an automated scheduler with documented dependencies, exception alerts for failed jobs, restart/rerun procedures, and review of console/scheduler logs. Unmonitored failed batch jobs (for example, an overnight posting that silently aborts) are a classic operations finding.

Recovery-Site Decision Spectrum

When a stem asks which alternate processing site is most appropriate, map the RTO and cost tolerance in the stem to this spectrum:

SiteEquipment & dataTime to operationalRelative costBest for
ColdSpace, power, HVAC only; little/no hardwareLongest (days/weeks)LowestNon-critical apps, long-term contracts
WarmHardware and connectivity present; software/data not fully currentMediumMediumSensitive but not time-critical apps
MobilePre-configured portable unit moved to the siteVariableMediumRegional outages, field deployment
HotMirrors production; hardware, software, near-current dataShortest (minutes/hours)HighestVital, critical applications with low RTO
ReciprocalTwo organizations agree to host each otherUncertainCheapestSame-region offices; least reliable

The core trade-off: shorter RTO → hotter site → higher cost. A hot site is justified when recovery must be fast and the application is critical. A reciprocal (mutual aid) agreement is the least expensive but the least dependable, because capacity, configuration compatibility, and the partner's own availability during a regional event are all uncertain.

High-availability techniques sit alongside sites: RAID (redundant disks), server clustering, and failover protect against component failure and reduce RTO for localized faults. They are not a substitute for off-site backups, which protect against site-level disasters, ransomware, and corruption that replication would simply copy to the standby.

RAID and High-Availability Building Blocks

CISA expects you to recognize the common RAID (Redundant Array of Independent Disks) levels and what each buys you, because they appear in availability and storage-operations questions.

RAID levelTechniqueFault tolerance
RAID 0Striping onlyNone — any disk failure loses all data (performance only)
RAID 1MirroringSurvives loss of one disk in a mirrored pair
RAID 5Block striping + distributed paritySurvives one disk failure; needs ≥3 disks
RAID 6Striping + double distributed paritySurvives two simultaneous failures; needs ≥4 disks
RAID 10Mirrored stripesHigh performance and fault tolerance; needs ≥4 disks

The trap is RAID 0: it improves throughput but provides no redundancy, so it never satisfies an availability requirement. Beyond RAID, server clustering lets a standby node take over automatically (failover), and redundant power, network paths, and load balancers remove single points of failure. All of these reduce RTO for component-level faults but, again, do not replace geographically separated backups.

The Change Lifecycle in Detail

Walk a normal change through its stages so the exam's process questions are automatic: RFC raised → categorized and risk-assessed → authorized (Change Advisory Board for significant changes) → built and tested in a non-production environment → scheduled and implemented with a back-out plan → post-implementation review → CMDB updated. A back-out (rollback) plan is the control auditors look for when a change asks to go to production; its absence is a finding. Emergency changes compress this flow but still require authorization and documentation, typically reviewed retroactively by an emergency CAB.

Test Your Knowledge

A critical application server fails during business hours and the cause is unknown. What is the immediate priority under ITIL?

A
B
C
D
Test Your Knowledge

An organization requires recovery of a vital, customer-facing application within two hours of any disruption. Which alternate-site strategy is most appropriate?

A
B
C
D
Test Your Knowledge

The same network outage has recurred three times this month, each restored by a manual workaround. What is the best long-term action?

A
B
C
D