Incident Investigation and Root Cause
Key Takeaways
- CSP11 places incident investigation and root causes in Program Management, but investigation also feeds Risk Management by updating hazard identification, risk ranking, and control verification.
- Incident investigation should preserve evidence, understand work as performed, identify system causes, and produce corrective actions that reduce recurrence risk.
- Root cause tools such as 5 Whys, fishbone, change analysis, barrier analysis, and event mapping are aids to reasoning, not proof by themselves.
- Corrective actions are stronger when they change design, engineering, barriers, maintenance, competency, staffing, procurement, or management controls rather than only retraining workers.
- Near misses and weak signals should be investigated proportionate to potential severity because they reveal risk before a major loss.
Investigation Is Risk Feedback
CSP11 Program Management asks candidates to determine appropriate incident investigation techniques, identify root causes, and apply corrective actions. That objective overlaps with Risk Management because every incident tests the accuracy of prior risk decisions. If a risk assessment predicted low likelihood but the event occurred, the organization needs to ask what assumption failed.
An incident can include injury, illness, property damage, environmental release, fire, process upset, security event, or near miss. A near miss deserves attention when credible severity is high. Waiting for injury before learning is poor risk management.
Stabilize, Preserve, Learn
The first priorities are life safety, emergency control, medical care, spill or fire response, and protection of people from continuing hazards. Once the scene is stable, preserve evidence that can explain what happened. Evidence may include equipment position, control settings, alarms, photos, video, permits, training records, maintenance history, inspection results, procedures, shift schedules, and environmental conditions.
Interview promptly and respectfully. Witness memory fades, but interviews should not feel like discipline. Ask what the person saw, heard, expected, and did. Ask how the job normally works, what was different, what pressures existed, and what information was available at the time. The CSP answer avoids leading questions and blame terms.
Describe Work as Performed
Many investigations fail because they compare the incident to the written procedure without learning how work is actually done. The written procedure may be outdated, too difficult, missing a step, or incompatible with production conditions. If employees routinely bypass a guard, skip a checklist, or improvise a tool, the investigation must ask why the system allowed or encouraged that behavior.
Build a timeline. Identify the initiating event, enabling conditions, failed barriers, successful barriers, decision points, and consequences. A timeline separates facts from assumptions and helps the team see where intervention could have changed the outcome.
Root Cause Tools
| Tool | Best use | CSP caution |
|---|---|---|
| 5 Whys | Simple cause chain when evidence is clear. | Stop only when a controllable system cause is reached, not when a person is blamed. |
| Fishbone diagram | Broad brainstorming across people, equipment, methods, materials, environment, and management. | Categories do not prove causes; evidence still matters. |
| Change analysis | Events where something differed from normal work. | Look at staffing, materials, equipment, software, weather, timing, layout, and supervision. |
| Barrier analysis | Events where safeguards failed, were missing, or were bypassed. | Test barrier independence, reliability, availability, and human dependence. |
| Event and causal factor charting | Complex events with multiple decisions and conditions. | Keep the chart fact-based and update it as evidence improves. |
The strongest method depends on complexity. A minor first-aid case may need a simple review. A fatality, major release, fire, or repeated serious near miss deserves a structured team investigation with technical expertise and management involvement.
Root Cause Is Not Worker Error
Worker error describes an action, not an adequate root cause. A CSP-level investigation asks why the action made sense at the time. Was the interface confusing? Was the alarm nuisance-prone? Was the tool unavailable? Was the procedure impractical? Was the worker fatigued? Were staffing levels too low? Did production pressure conflict with a safe sequence? Did training verify actual competency?
Human factors do not excuse unsafe acts; they explain how to prevent recurrence. If the corrective action is only remind employees, retrain everyone, or enforce the rule, the exam may prefer a deeper system fix.
Corrective Action Quality
Corrective actions should match the hierarchy of controls. Eliminate a task, substitute a safer material, redesign equipment, add guarding, improve ventilation, automate a high-risk step, improve detection, strengthen maintenance, revise procurement, or change staffing when those actions address causes. Training and discipline may be necessary, but they are weaker when used alone for a design or system problem.
Each action needs an owner, due date, interim control, verification method, and effectiveness check. Completion means more than closing a work order. It means the cause has been addressed and the risk assessment, JHA, permit, procedure, training, or maintenance plan has been updated where needed.
Close the loop with management review when the finding affects resources, design standards, purchasing, staffing, or risk tolerance. A local fix may stop one recurrence, but a management-system fix prevents the same weakness from appearing elsewhere. CSP scenarios often favor the answer that spreads lessons to similar equipment, tasks, sites, contractors, or shifts. Document why rejected controls were not selected, especially when cost, schedule, or technical feasibility leaves meaningful residual exposure.
Trend and Communicate
One incident can reveal a local issue. Several similar events reveal a system pattern. Trend analysis can show repeated dropped objects in one area, repeated chemical splashes during cleaning, recurring contractor permit errors, or growing near misses during overtime. Those signals should change inspection focus, leading indicators, and management review.
Risk communication after investigation must be honest and useful. Workers need lessons they can apply. Leaders need risk significance, resource needs, and accountability. The organization needs a non-punitive reporting climate so weak signals keep surfacing before severe loss occurs.
An investigation concludes that an employee failed to follow a lockout step and recommends retraining the department. What is the best CSP critique?