5.3 Post-market monitoring & incident response
Key Takeaways
- EU AI Act Article 72 requires providers of high-risk systems to run a post-market monitoring system, guided by a documented monitoring plan, across the system's lifetime.
- Data drift (shifting inputs) and concept drift (a shifting input-to-target relationship) cause performance degradation that continuous monitoring of accuracy, fairness, and data-quality metrics is designed to catch.
- Article 12 mandates automatic logging for traceability, and Article 26(6) requires deployers to retain logs for at least six months unless other law requires otherwise.
- A serious incident under Article 3(49) includes death or serious health harm, serious and irreversible critical-infrastructure disruption, fundamental-rights infringement, or serious property or environmental harm.
- Under Article 73, providers report serious incidents no later than 15 days generally, 10 days for a death, and 2 days for widespread infringement or critical-infrastructure disruption, with rollback and Article 20 corrective actions closing the loop.
From launch to lifecycle: post-market monitoring
Deployment is the start of an AI system's most consequential phase, not the end of governance. The EU AI Act codifies this in Article 72, which requires providers of high-risk systems to establish a post-market monitoring system, proportionate to risk, that actively collects and analyzes performance data across the system's lifetime, guided by a documented post-market monitoring plan for which the Commission provides a template. The goal is to catch problems that pre-deployment testing could not: real-world inputs, adversarial use, and a changing world all degrade systems in ways a launch-day snapshot cannot predict. Deployers contribute as well, because they observe the system operating on live data and are obligated to monitor its operation and to escalate risks to the provider and, where relevant, to authorities. Monitoring is therefore a shared, continuous responsibility rather than a provider-only afterthought.
Drift, degradation, and feedback loops
The central technical risk in production is drift. Data drift occurs when the input distribution moves away from the training distribution; concept drift occurs when the relationship between inputs and the target changes, so yesterday's fraud patterns stop predicting today's. Either produces performance degradation: falling accuracy, rising error rates, or worsening disparities across groups. Monitoring counters this by tracking metrics continuously:
- Performance metrics — accuracy, precision and recall, and error rates measured against a baseline.
- Fairness metrics — outcome disparities across protected groups tracked over time.
- Data-quality and drift metrics — input distribution shifts, missing values, and outliers.
- Operational metrics — latency, uptime, and usage volume.
A healthy program closes the feedback loop: it alerts on threshold breaches, routes flagged cases to human review, and triggers periodic revalidation or retraining, while guarding against harmful feedback loops in which a model's own outputs bias the data used to train its successor.
Several techniques operationalize this monitoring. A champion-challenger setup runs a candidate model in shadow mode against the live model to compare quality before promotion; sampling-based human review audits a fraction of production decisions with expert judgment; and canary or staged rollouts expose a new version to a small slice of traffic first so regressions surface on a limited blast radius. Together these turn monitoring from a passive dashboard into an early-warning system that catches problems before they reach every user.
Logging and traceability
Monitoring depends on records. Article 12 requires high-risk systems to automatically log events over their lifetime to enable traceability, and Article 26(6) requires deployers to keep those logs for an appropriate period, at least six months unless other law says otherwise. Good logging captures the input, the output, the model version, a timestamp, and any oversight actions, so an incident can be reconstructed and a specific decision explained or audited long after it was made.
Defining and detecting AI incidents
You cannot respond to what you cannot define. The EU AI Act, in Article 3(49), defines a serious incident as an incident or malfunctioning of an AI system that directly or indirectly leads to any of the following: the death of a person or serious harm to health; a serious and irreversible disruption of critical infrastructure; an infringement of fundamental-rights obligations under Union law; or serious harm to property or the environment. Detection blends the monitoring signals above with user complaints, red-team findings, and contestability channels. Programs classify incidents by severity so that response effort is proportionate to potential harm. Because either the provider or the deployer may be the first to notice a malfunction, the responsibilities and hand-offs between them should be agreed in advance rather than negotiated during a crisis.
Incident response and reporting duties
When an incident occurs, a defined incident response process activates: triage, containment, investigation, remediation, and notification. The plan should predefine roles, communication paths, and the thresholds for invoking rollback, so the team executes rather than improvises during a live harm event. Reporting is not optional. Under Article 73, providers must report serious incidents to the relevant market surveillance authority on strict timelines:
| Situation | Deadline after becoming aware |
|---|---|
| General serious incident | Without undue delay, no later than 15 days |
| Death of a person | No later than 10 days |
| Widespread infringement or serious/irreversible critical-infrastructure disruption | No later than 2 days |
Deployers who identify a serious incident must inform the provider, and where relevant the authority, without undue delay. These duties layer on top of other regimes, such as GDPR breach notification and sectoral rules, so a single event may trigger several parallel reports.
Remediation and rollback
Detection and reporting are meaningless without correction. Article 20 requires providers to take immediate corrective actions, bringing a non-conforming system into compliance, withdrawing it, disabling it, or recalling it, and to inform affected parties. Operationally, teams need a tested rollback capability to revert to a previous safe model version, plus root-cause analysis so the same failure does not recur. Rollback, kill-switches, and versioned deployments are precisely the controls that make the human-in-command posture from Section 5.1 actionable: when monitoring shows a system is causing harm, the organization must be able to stop it quickly and cleanly, then correct and revalidate before resuming. Each incident should also feed lessons learned back into the risk assessment and monitoring plan, so the same class of failure is anticipated rather than merely repaired, closing the loop from detection through prevention.
An image-classification model's accuracy falls in production because the kinds of photos users submit have shifted away from the training set. This is best described as:
Under EU AI Act Article 73, within what maximum period must a provider report a serious incident involving the death of a person to the market surveillance authority?
Which EU AI Act requirement most directly supports reconstructing and explaining a specific past decision when investigating an incident?