Incident Investigation & Root Cause Analysis

Key Takeaways

  • Incident investigation finds root causes - the underlying system failures - not just immediate causes or who to blame; the goal is prevention, not fault.
  • OSHA requires reporting a work-related fatality within 8 hours and any in-patient hospitalization, amputation, or loss of an eye within 24 hours (29 CFR 1904.39).
  • The 5 Whys repeatedly asks why until the systemic cause surfaces; the Fishbone (Ishikawa) diagram organizes causes into categories like People, Methods, Materials, Machines, Measurement, and Environment.
  • Heinrich's/Bird's safety pyramid shows many near misses and minor injuries underlie each serious injury, so investigating near misses prevents future fatalities.
  • Effective corrective actions move UP the hierarchy of controls (elimination/engineering) rather than relying solely on PPE or 'retrain the worker,' which rarely fixes the root cause.
Last updated: June 2026

Purpose: Prevention, Not Blame

Incident investigation falls under the STSC's safety-program management content and is woven through leadership questions. The single most important concept the exam tests is mindset: an investigation exists to find why the system failed so the failure can be prevented, not to assign blame. OSHA itself reframed 'accident investigation' as 'incident investigation' to stress that incidents are caused and therefore preventable - the word 'accident' implies bad luck.

Distinguish three layers of cause:

  • Immediate (direct) cause - the unsafe act or condition closest to the event (e.g., a worker stepped on an unguarded floor opening).
  • Contributing cause - factors that made the incident more likely (poor lighting, rushed schedule).
  • Root cause - the underlying system or management failure (no fall-protection plan, no hazard inspection, no training program).

Fixing only the immediate cause lets the same root cause produce the next incident. The STSC rewards answers that target the root cause.

OSHA Reporting and Recordkeeping (29 CFR 1904)

Know the reporting clocks cold - they are frequently tested numerics:

EventReport to OSHA withinCitation
Work-related fatality8 hours1904.39
In-patient hospitalization24 hours1904.39
Amputation24 hours1904.39
Loss of an eye24 hours1904.39

Separately, recordable injuries go on the OSHA 300 Log, summarized on the 300A (posted Feb 1 - Apr 30), with a 301 incident report for each case. A case is recordable if it involves death, days away from work, restricted duty/transfer, medical treatment beyond first aid, or loss of consciousness.

Root Cause Analysis Tools

Two RCA methods dominate STSC questions:

  • 5 Whys - ask 'why?' repeatedly (often about five times) until the systemic cause emerges. Each answer becomes the next question. Best for simpler, single-thread events; a near-miss 5 Whys can be done in well under an hour.
  • Fishbone / Ishikawa (Cause-and-Effect) diagram - the 'effect' is the head; 'bones' group potential causes into categories. A common construction set is the 6 Ms: Manpower (People), Methods, Materials, Machines, Measurement, and Mother Nature (Environment). Best when many factors interact.

Other recognized methods include Fault Tree Analysis (top-down logic tree) and TapRooT/event-and-causal-factor charting for complex events.

Worked Example: 5 Whys

  1. Why was the worker injured? He fell from a leading edge.
  2. Why? He had no fall protection.
  3. Why? No anchor point was installed on that elevation.
  4. Why? The fall-protection plan did not address that work area.
  5. Why? No competent person reviewed the task before it started.

Root cause: absence of a competent-person hazard review / planning process - a system fix, not 'the worker was careless.'

The Safety Pyramid and Near Misses

Heinrich's Triangle (and Bird's later version) holds that beneath each serious injury lie many minor injuries and a far larger base of near misses and at-risk behaviors. The practical lesson: investigate near misses because they are free lessons that reveal the same root causes as fatalities without the harm. An organization that applies rigorous RCA to near misses is dismantling the base of the pyramid before it produces a fatality.

Writing Effective Corrective Actions

Corrective actions should be SMART (Specific, Measurable, Assignable, Realistic, Time-bound) and should climb the hierarchy of controls:

  1. Elimination - remove the hazard (best).
  2. Substitution - replace it with something safer.
  3. Engineering controls - guardrails, barriers, ventilation.
  4. Administrative controls - procedures, training, signage.
  5. PPE - last line of defense (weakest).

'Retrain the worker' and 'be more careful' are administrative-only fixes and rarely correct a root cause - a frequent wrong answer on the exam.

Common Exam Traps

  • Blame as a conclusion. 'Worker error' is almost never an acceptable root cause.
  • Confusing the 8-hour and 24-hour clocks. Fatality = 8 hours; hospitalization/amputation/eye = 24 hours.
  • Stopping at the immediate cause. Push to the system level.
  • Skipping near misses. They have the highest prevention value per dollar.
  • Choosing PPE over engineering. Prefer higher controls when the question asks for the most effective corrective action.

The Investigation Process Step by Step

The STSC expects a logical sequence, not an ad hoc reaction. A defensible investigation follows these steps:

  1. Secure the scene and care for the injured. First priority is people - render aid and call emergency services; then preserve the scene so evidence is not lost.
  2. Preserve and collect evidence. Capture the 4 Ps: People (witnesses), Parts (failed equipment), Position (locations, photos, measurements), and Paper (permits, JSAs, training records).
  3. Interview witnesses promptly. Interview separately, soon after the event, with open-ended, non-leading questions and a non-blame tone so memory is fresh and honest.
  4. Analyze for root cause. Apply 5 Whys, Fishbone, or fault tree to move from immediate to systemic causes.
  5. Develop corrective actions. Write SMART actions high on the hierarchy of controls.
  6. Implement, verify, and follow up. Confirm each action was completed and actually reduced the risk, then share lessons learned.

Investigate promptly - ideally within 24 to 48 hours - because evidence degrades and memories fade quickly. The supervisor typically leads or supports the investigation as the person closest to the work.

Leading vs. Lagging Indicators

Investigation data feeds two kinds of metrics. Lagging indicators count harm that already happened: TRIR (Total Recordable Incident Rate) and DART (Days Away, Restricted, or Transferred), both calculated as (number of cases x 200,000) / total hours worked, where 200,000 represents 100 full-time workers over a year. Leading indicators are proactive and predictive: near-miss reports submitted, inspections completed, training delivered, and corrective actions closed on time. A mature program drives leading indicators up to prevent future lagging-indicator harm.

The exam may ask you to classify a metric or to compute a TRIR with the 200,000-hour constant.

First Aid vs. Recordable Treatment

A recurring distinction: a case is recordable only when treatment goes beyond first aid. First aid (not recordable) includes using non-prescription medication at non-prescription strength, cleaning/bandaging minor wounds, hot/cold therapy, and removing splinters with tweezers. Treatment such as sutures, prescription medication, or splinting a fracture is medical treatment beyond first aid and makes the case recordable. Knowing this line helps you answer both recordkeeping and investigation questions correctly.

Test Your Knowledge

Under 29 CFR 1904.39, within how many hours must an employer report a work-related fatality to OSHA?

A
B
C
D
Test Your Knowledge

A 5 Whys analysis of a fall concludes that no competent person reviewed the task before work began. This represents which type of cause?

A
B
C
D
Test Your Knowledge

Which corrective action is the MOST effective according to the hierarchy of controls?

A
B
C
D