System Safety and Human Factors
Key Takeaways
- CSP11 explicitly includes system safety analysis techniques such as fault tree analysis, FMEA, Safety Case thinking, and risk summation.
- System safety treats incidents as combinations of hazards, barriers, interfaces, operating conditions, and management decisions.
- Human factors analysis asks how design, workload, fatigue, visibility, alarms, tools, procedures, and incentives shape performance.
- The strongest corrective actions change the system so the desired safe action is easier, clearer, and more reliable.
- Common-cause failures and dependent safeguards deserve special attention because they can defeat multiple barriers at once.
Think in Systems
CSP11 places system safety analysis in Program Management and human factors in Occupational Health and Applied Science. That pairing is important. A CSP is expected to evaluate how people, equipment, procedures, software, maintenance, production pressure, and safeguards interact. The exam may describe one visible error, but the better answer often identifies the system condition that made the error likely.
A system-safety view starts with hazardous energy and asks what barriers keep that energy from reaching people, assets, the public, or the environment. Barriers can prevent the event, detect an abnormal condition, control the event after it starts, or mitigate the consequence. The same barrier can look strong on paper and weak in use if it depends on attention during fatigue, unreadable displays, awkward body position, or a bypassed interlock.
Choosing the Analysis Method
| Method | Best use | Exam cue |
|---|---|---|
| Fault Tree Analysis | Work backward from a top event through combinations of causes. | Undesired event, logic gates, combinations of failures. |
| Event Tree Analysis | Work forward from an initiating event through safeguard success or failure. | Release, fire, spill, evacuation, consequence paths. |
| FMEA | Review component or process failure modes and effects. | Failure mode, severity, occurrence, detection, priority. |
| HAZOP | Examine process deviations from design intent. | Flow, pressure, temperature, level, guide words, process node. |
| Bow-tie | Show threats, event, consequences, and barriers. | Communication of controls around a central event. |
| Safety Case | Build an argument that risk is controlled with evidence. | Claim, evidence, assurance, critical system. |
The method should match the problem. If the question asks how several failures combine to produce a fatal hoist drop, fault tree analysis fits. If it asks what outcomes follow a chemical release depending on detection, isolation, ventilation, ignition, and emergency response, event tree thinking fits. If it asks which pump seal or valve failure mode deserves redesign, FMEA fits.
Barrier Quality
CSP-level analysis asks whether barriers are independent, effective, auditable, and maintained. A high alarm and a shutdown tied to the same failed transmitter are not independent. A procedure that requires a worker to remember a rare step during an emergency is weaker than a physical interlock. A relief valve without inspection, set-pressure control, and a safe discharge path is not a complete safeguard.
Common-cause failure is a frequent trap. Fire can disable power, communications, alarms, and pumps if they share the same vulnerable route. A single supervisor can approve, perform, and verify a critical isolation if roles are poorly separated. A software change can affect several safeguards at the same time. When safeguards are dependent, risk summation must treat them as linked, not as separate layers.
Human Factors
Human factors does not mean excusing unsafe behavior. It means designing work so reliable performance is realistic. People make slips, lapses, mistakes, and violations for different reasons. A slip may come from poor control layout. A lapse may come from interruption. A mistake may come from a misleading procedure. A violation may come from incentives, impossible schedules, unavailable tools, or a rule that conflicts with actual work.
Look for error-likely conditions:
- Poor visibility, glare, noise, heat, vibration, or awkward reach.
- Similar controls, unlabeled valves, hidden status, or alarm overload.
- Excessive workload, fatigue, time pressure, or long monotony.
- Procedures that do not match the task or are not available at the point of use.
- PPE that interferes with vision, grip, hearing, or communication.
The corrective action should fit the human factor. If operators misread two similar valves, use labeling, shape coding, separation, lockable positions, or engineered sequencing. If maintenance bypasses a guard to troubleshoot, provide a safe diagnostic mode and controlled energy state. If drivers are fatigued, use scheduling, route planning, monitoring, and stop-work authority, not only a reminder to be careful.
Safety Case Thinking
A Safety Case is a structured claim that a system is acceptably safe for a defined use, supported by evidence. The exam does not require writing a full case, but the logic is useful. State the claim, list the hazards, identify controls, show evidence that controls work, define operating limits, and monitor assumptions. Evidence can include test data, inspection records, proof testing, training competency, audit results, and incident learning.
A strong safety case also states boundaries. A robot cell may be safe with fixed fencing, interlocked access, validated safety-rated controls, taught maintenance procedures, and trained authorized employees. That claim may not hold after speed changes, tooling changes, new product geometry, altered software, or defeated presence sensing. System safety therefore links directly to Management of Change.
Incident Learning
When a scenario includes a worker action, ask why the action made sense at the time. Was the display confusing? Was the task impossible without bypassing a safeguard? Was a permit treated as paperwork? Were alarms ignored because nuisance alarms were common? Were contractors outside the normal communication path?
Good CSP answers strengthen the system: redesign, remove dependencies, improve the interface, clarify authority, test safeguards, and monitor leading signals. Weak answers stop at retraining or discipline when the facts show a predictable system weakness.
Use human factors as a design check before closing corrective actions. If a safeguard depends on perfect memory, perfect visibility, perfect posture, or perfect timing, the system is fragile. The stronger fix reduces the demand on attention and makes the safe state obvious.
A robotic palletizer has repeated interlock bypasses during jam clearing because the only reset point is inside a cramped fenced area and production pressure is high. What response best applies system safety and human factors?