5.1 Deployment decisions & human oversight
Key Takeaways
- Deployment gating clears a consequential AI system for production only after readiness criteria are met and a named, accountable owner signs off, with review depth tiered by risk.
- Human-in-the-loop requires per-output human approval, human-on-the-loop lets a person monitor and override an autonomous system, and human-in-command governs whether the system is used at all.
- EU AI Act Article 14 makes human oversight meaningful only when the person has the authority, competence, time, and information to interpret, disregard, override, or stop the system.
- Automation bias, the tendency to over-trust machine output, is a named risk that Article 14 oversight design must actively counter.
- Under Article 25, a deployer that makes a substantial modification or changes a system's intended purpose becomes a provider and inherits provider obligations, so context shifts must trigger re-gating.
From development to deployment: the go/no-go gate
Governing AI deployment starts with a formal decision gate that separates a validated system from a live one that affects real people. In the AIGP framework, deployment gating means no consequential AI system enters production until defined readiness criteria are met and an accountable owner signs off. Typical gate criteria include a completed risk and impact assessment (and, for EU AI Act high-risk systems, a conformity assessment); a documented intended purpose; validation evidence for accuracy, robustness, fairness, and security; a data protection impact assessment (DPIA) and, where required, a fundamental rights impact assessment (FRIA) under Article 27; an approved human-oversight design; user documentation and training; and a tested rollback plan. A cross-functional deployment review board, spanning legal, privacy, security, the business owner, and often a model-risk or ethics function, renders the final go/no-go call. Gating is tiered by risk: a low-impact internal writing assistant may need only a lightweight self-attestation, while a system that screens job applicants, prices insurance, or scores credit demands full board approval and executive sign-off. The gate decision itself is recorded, capturing who approved it, on what evidence, under which conditions, and when the approval expires, so that accountability remains traceable after go-live. The point is to make deployment a deliberate, documented, reversible decision rather than a silent hand-off from the data-science team to production.
Appropriate use versus prohibited use
A deployment decision must fix the envelope of acceptable use and the boundaries outside it. Appropriate use is captured as the intended purpose, target population, operating context, and explicit assumptions about input quality, environment, and user competence. Prohibited use is stated just as plainly: applications outside the validated purpose, uses that were never tested, and anything unlawful. Documentation should also flag reasonably foreseeable misuse, the predictable ways users might push a system beyond its purpose, so mitigations and warnings exist before launch rather than after an incident. Under the EU AI Act, some practices are prohibited outright by Article 5: social scoring by public authorities, manipulative or exploitative techniques that cause harm, untargeted scraping of facial images to build recognition databases, workplace and school emotion recognition, and most real-time remote biometric identification in public spaces for law enforcement. An acceptable-use policy translates these limits into rules staff can actually follow, reinforced by technical guardrails and monitoring. A deployer who pushes a system beyond its intended purpose typically voids the provider's assurances and, as discussed below, may assume provider-level obligations and liability.
Three models of human oversight
Meaningful human oversight keeps an automated system accountable, and the exam expects you to distinguish three postures by how much control the human retains:
| Model | Human role | Best fit |
|---|---|---|
| Human-in-the-loop (HITL) | A person reviews and approves each output before it takes effect; the system cannot act alone. | High-stakes, lower-volume decisions such as clinical diagnosis or loan denial. |
| Human-on-the-loop (HOTL) | The system acts autonomously while a person monitors and can intervene or override in real time. | High-volume decisions where per-case review is impractical, such as fraud flagging or content moderation. |
| Human-in-command (HIC) | A person governs the overall activity, deciding whether and when the system is used at all and able to deactivate it. | Strategic oversight sitting above any single decision. |
These postures are layered, not mutually exclusive: a mature program keeps human-in-command governance above day-to-day HOTL or HITL operation.
Meaningful oversight and the ability to override
EU AI Act Article 14 requires high-risk systems to be designed so assigned overseers can genuinely exercise control. In practice, oversight must let a person understand the system's capabilities and limits, stay alert to automation bias (the tendency to over-trust machine output), correctly interpret the result, decide not to use the system or to disregard or override an output, and intervene or halt operation through a reliable stop mechanism. A common failure mode is a human override that exists only on paper: the operator sees a score but has no realistic basis, time, or authority to challenge it. Oversight is meaningful only when the human holds the authority, competence, time, and information to act; Article 14 targets exactly this gap by demanding effective, not nominal, control. Programs make oversight real through training, clear escalation paths, confidence indicators, second-reviewer requirements for edge cases, and workload limits that keep reviewers effective rather than swamped.
Change management for context shifts
An approval is valid only for the context in which it was granted, so change management governs what happens when that context shifts: new user populations, a new jurisdiction, a drifted data source, a retrained model, or an expanded use case. Any of these can invalidate prior validation and require re-assessment before continued use. Under Article 25, a substantial modification or a change to the system's intended purpose can transform a deployer into a provider, triggering the full provider obligations. Because performance can also degrade silently as the world changes, change management connects directly to the post-market monitoring covered later in this chapter: monitoring detects the drift, and change management decides whether the system may keep running. A disciplined process versions models and configurations, defines the triggers that force re-gating, and keeps a change log tying each production version to its approval. The governing principle is that deployment is not a one-time event but a continuously governed state, and every material context change reopens the go/no-go question.
Which human-oversight model has a person review and approve each individual output before it takes effect, so the system cannot act on its own?
Under EU AI Act Article 25, which action by a deployer most clearly causes it to be treated as a provider and inherit provider obligations?
Article 14 requires human oversight of a high-risk system to be 'meaningful.' Which situation best shows oversight that is NOT meaningful?