4.3 Governance frameworks (NIST AI RMF, EU AI Act, model cards)

Key Takeaways

  • NIST AI RMF is a voluntary framework with four functions: Govern, Map, Measure, and Manage; testing mainly feeds Measure.
  • The EU AI Act is binding law with four risk tiers: unacceptable (banned), high, limited (transparency), and minimal.
  • The EU AI Act entered into force on 1 August 2024; prohibitions apply from Feb 2025, GPAI from Aug 2025, most high-risk rules from Aug 2026.
  • Model cards and system cards document intended use, limitations, and evaluation results to provide transparency.
  • The tester's governance role is evidence and traceability: linking test results to requirements, risks, and regulatory obligations.
Last updated: July 2026

Why governance matters

Governance frameworks give organisations a structured, auditable way to manage AI risk, and they increasingly carry legal force. For a tester the practical consequence is the same across every framework: you must produce evidence that risks were identified, tested, and controlled, and keep that evidence traceable to requirements and to specific test results.

NIST AI Risk Management Framework

The NIST AI RMF 1.0 (released January 2023) is a voluntary framework organised around four core functions:

  • Govern — a cross-cutting function that establishes the culture, policies, roles, and accountability for managing AI risk throughout the organisation.
  • Map — establish the context and identify the risks for the specific AI system and its intended use.
  • Measure — analyse, assess, benchmark, and track the identified risks using quantitative and qualitative methods.
  • Manage — prioritise and act on the risks, allocating resources to treat, monitor, and respond to them.

Testing feeds mainly the Measure function: test results are the measurements that show whether risks such as bias, hallucination, and security exposure are within tolerance, while Govern sits across all of the other functions.

The framework is deliberately non-prescriptive: it does not mandate specific tests, so each organisation operationalises the functions with its own controls, supported by a companion Playbook and the Generative AI Profile that map concrete actions to each function. A central idea is that AI risk must be managed across the whole lifecycle — from design and data collection through deployment and decommissioning — not treated as a one-off release gate. For testers this means the four functions are continuous activities they contribute to repeatedly, rather than boxes ticked once before go-live.

Test Your Knowledge

In the NIST AI RMF, which function analyses, assesses, and tracks identified risks, and is the one that testing most directly feeds?

A
B
C
D

EU AI Act

The EU AI Act is binding law that regulates AI by risk tier:

Risk tierTreatment
UnacceptableBanned outright (e.g. social scoring, most manipulative or exploitative uses)
HighPermitted but heavily regulated: risk management, data governance, logging, human oversight, conformity assessment
LimitedTransparency obligations — users must be told they are dealing with AI, and deepfakes/AI content must be labelled
MinimalNo specific obligations (the vast majority of AI uses)

The Act is fundamentally risk-based: obligations scale with the potential for harm, so most everyday applications fall into the minimal tier with no new duties, while a small set of high-risk uses (AI in recruitment, credit scoring, medical devices, or critical infrastructure) carry the heaviest requirements. High-risk providers must, among other things, maintain a risk-management system, ensure data quality, keep technical documentation and automatic logs, enable human oversight, and pass a conformity assessment before the product reaches the market.

It entered into force on 1 August 2024 and applies in phases: prohibited practices from 2 February 2025, general-purpose AI (GPAI) model obligations from 2 August 2025, and most high-risk requirements from 2 August 2026 (some embedded-product rules extend to 2027). GPAI providers face documentation, copyright, and transparency duties, with stricter rules for models judged to pose systemic risk.

Model cards & system cards

A model card is standardised documentation for a model: its intended use and out-of-scope uses, training-data provenance, evaluation results (including performance broken down across groups), known limitations, and ethical considerations. A system card documents a whole deployed system — its components, safety evaluations, and residual risks. Model cards were introduced by Mitchell et al. in 2019 precisely to make these disclosures systematic. Well-maintained cards are not marketing: they state honestly where a model should not be used and what its measured error rates are, which is what lets downstream teams judge fitness for their own context and directly supports the transparency obligations above.

Test Your Knowledge

Under the EU AI Act, an application that must simply tell users they are interacting with AI and label AI-generated content falls into which risk tier?

A
B
C
D

The tester's role in governance

Across all three frameworks the tester supplies the evidence layer:

  • Traceability — link every test case and result back to a requirement, a risk, and, where relevant, a regulatory obligation, so an auditor can follow the chain end to end.
  • Evidence generation — produce and retain test reports, logs, evaluation metrics, and coverage data as objective proof that the controls were actually exercised.
  • Contributing to model/system cards — feed real evaluation results (accuracy, bias, robustness, hallucination rate) into the documentation instead of aspirational claims.
  • Verifying transparency controls — confirm that AI disclosure, content labelling, and human-oversight mechanisms actually work as the assigned risk tier requires.
  • Continuous monitoring — governance is ongoing, so testers re-verify after model updates, prompt changes, and data drift.

These frameworks are complementary rather than competing. The EU AI Act sets what is legally required for a given risk tier; the NIST AI RMF offers a how — a repeatable process for identifying and treating risk; and model or system cards are a documentation format that captures the resulting evidence. A tester who understands all three can position their test artefacts as the connective tissue: NIST's Measure activities generate metrics, those metrics populate the model card, and both together demonstrate that the EU AI Act's obligations for the system's tier have been met. That is why traceability is the tester's most valuable contribution — without it, evidence exists but cannot be shown to satisfy any specific requirement.

In short, governance turns quality and risk work into documented, defensible evidence. The tester does not own the policy, but their outputs are what make the organisation's AI claims auditable and trustworthy.