6.2 Documentation, traceability & continuous improvement

Key Takeaways

  • Model cards document a model's intended use, group-disaggregated performance, and limitations, while datasheets for datasets document data provenance, composition, collection, and consent.
  • Impact-assessment records—including GDPR DPIAs and the EU AI Act's fundamental rights impact assessment (FRIA)—capture who is affected, plausible harms, chosen mitigations, and the residual risk leadership accepted.
  • EU AI Act Article 12 requires high-risk systems to automatically record events (logs) over their lifetime, supporting post-deployment traceability alongside governance decision logs.
  • Versioning of models, data, and code (lineage), often automated through an MLOps model registry, enables reproducibility so an organization can explain which version produced a specific decision.
  • Continuous improvement follows a Plan–Do–Check–Act loop—monitor, learn, update—reinforced by EU AI Act post-market monitoring (Article 72) and serious-incident reporting (Article 73).
Last updated: July 2026

The documentation and accountability backbone

Governance is only credible if it is evidenced. Documentation and traceability create the paper trail that lets an organization demonstrate—to auditors, regulators, and affected people—that an AI system was built, reviewed, and operated responsibly. The AIGP exam treats this as the accountability backbone that connects design decisions to real-world outcomes, and it expects you to recognize the standard artifacts and the legal hooks behind them.

Model cards and datasheets

Two artifacts appear repeatedly. Model cards, introduced by Mitchell and colleagues in 2019, summarize a model's intended use, out-of-scope uses, performance disaggregated across demographic groups, evaluation data, limitations, and ethical considerations. They make a model's boundaries legible to non-developers. Datasheets for datasets (Gebru and colleagues) document a dataset's motivation, composition, collection process, preprocessing, recommended uses, and maintenance—surfacing provenance and consent questions before data becomes a liability. Under the EU AI Act these largely voluntary artifacts are complemented by mandatory technical documentation (Annex IV) and instructions for use that high-risk providers must prepare, keep current, and make available to authorities.

ArtifactPrimary focusKey questions it answers
Model cardThe modelIntended use, performance by group, limitations
Datasheet for datasetsThe dataProvenance, composition, consent, maintenance
Impact assessmentThe deployment contextWho is affected, what harms, what mitigations
Technical documentationConformity evidenceDesign, testing, and compliance of a high-risk system

Impact assessments, decision logs, and audit trails

Impact-assessment records—algorithmic impact assessments, DPIAs under the GDPR, and the EU AI Act's fundamental rights impact assessment (FRIA) required of certain high-risk deployers—document who could be affected, what harms are plausible, and which mitigations were chosen. These are decision records, not marketing: a good assessment captures the options that were rejected and the residual risk leadership consciously accepted. Because they are dated and signed, they also fix accountability in time—showing what a reasonable organization knew, and decided, at the moment of deployment—which is exactly what regulators and courts ask for after an incident.

Traceability also depends on decision logs and audit trails. EU AI Act Article 12 requires high-risk systems to technically allow for the automatic recording of events (logs) throughout their lifetime, supporting post-deployment traceability. Governance decision logs—who approved a use case, on what evidence, and under what conditions—complement those technical logs and become essential when a system's behavior is later challenged or investigated. Together they let an organization reconstruct not just what a model did, but why humans allowed it to do so.

Versioning, reproducibility, and management review

Because models, data, and code all evolve, versioning and reproducibility are core controls. Recording model versions, training-data snapshots, hyperparameters, and code commits—collectively, data and model lineage—lets an organization reproduce a past result and explain which version produced a specific decision. Modern MLOps pipelines automate much of this lineage capture through a model registry so it does not depend on memory, and reproducibility is what makes an external audit or a legal challenge survivable. It also underpins the right to contest an automated decision: an organization that cannot say which model version, data, and inputs produced an outcome cannot meaningfully explain or defend that outcome to the affected individual.

Governance must also be reviewed by leadership. Management-system standards such as ISO/IEC 42001 require periodic management review, in which leadership examines metrics and KPIs—incident counts, bias-test results, model-performance drift, audit findings, human-override rates, and training-completion rates—and decides what to change. Well-chosen KPIs turn governance from a static checklist into a measured discipline that leaders can steer; lagging indicators such as incidents show what already went wrong, while leading indicators such as review coverage and training rates predict where it might. Management review also produces its own record—decisions, action items, and owners—closing the loop back to the decision logs and giving auditors evidence that oversight is active rather than ceremonial.

The continuous-improvement loop

AI systems degrade and regulations change, so governance must be a loop rather than a one-time gate. The Plan–Do–Check–Act (PDCA) cycle embedded in ISO management systems captures the idea: monitor performance and risk in production, learn from incidents and drift, and update policies, controls, and models accordingly.

Regulation reinforces the loop. The EU AI Act obliges providers of high-risk systems to operate post-market monitoring (Article 72) and to report serious incidents to authorities (Article 73). Operationally, continuous improvement looks like this:

  • Monitor — track drift, bias, performance, and misuse against defined thresholds.
  • Detect and triage — capture incidents and near-misses through a defined channel.
  • Learn — perform root-cause analysis and update the risk assessment.
  • Update — revise models, controls, documentation, and policy, then re-train staff.
  • Report — inform the governance committee, and regulators where the law requires.

This closed loop keeps governance effective as models, data, threats, and law evolve, ensuring documentation remains a living record rather than a snapshot frozen at launch. The exam rewards seeing documentation and monitoring as one system: artifacts capture the state, KPIs measure it, and the improvement cycle acts on what they reveal—so that when a model or a regulation changes, the evidence trail and the controls move with it.

Test Your Knowledge

What is the primary purpose of a "datasheet for datasets," as distinct from a model card?

A
B
C
D
Test Your Knowledge

Under the EU AI Act, Article 12 requires high-risk AI systems to do what in order to support traceability?

A
B
C
D
Test Your Knowledge

An AI governance program runs a Plan–Do–Check–Act loop. Which activity best represents the checking and feedback that keeps governance current?

A
B
C
D
Congratulations!

You've completed this section

Continue exploring other exams