4.2 Design & build controls

Key Takeaways

  • Governance-by-design translates impact-assessment findings into non-functional requirements for fairness, robustness, security, explainability, privacy, and human oversight that are specified, budgeted, and tested from the start.
  • AI threat modeling must cover AI-specific attacks: data poisoning, adversarial examples, model theft/extraction, model inversion and membership inference, and prompt injection.
  • Prompt injection is a distinctive risk for LLM/agentic systems because models cannot reliably separate trusted instructions from untrusted content; defenses are layered (filtering, least-privilege tools, human confirmation).
  • EU AI Act Article 14 makes human oversight a design property; teams choose human-in-the-loop, human-on-the-loop, or human-in-command based on stakes and reversibility.
  • Secure development practices — versioning of code/data/models, reproducible pipelines, signed artifacts, supply-chain scanning, and documentation — support ISO/IEC 42001 and the NIST AI RMF and produce audit-ready evidence.
Last updated: July 2026

Governance-by-Design During Development

The design and build phase is where abstract risk findings become concrete engineering requirements. Governance-by-design means translating the impact assessment's conclusions into non-functional requirements — for fairness, robustness, security, explainability, privacy, and human oversight — that are specified, budgeted, and tested like any other requirement. AIGP candidates should recognize this as the operational counterpart to "data protection by design and by default" (GDPR Article 25) and to the EU AI Act's requirement that high-risk systems be built with appropriate accuracy, robustness, cybersecurity, and human oversight designed in from the start rather than bolted on afterward.

Requirements for Trustworthy Properties by Design

Each trustworthy-AI property becomes an explicit design commitment:

  • Fairness by design — defining protected attributes, selecting fairness metrics appropriate to the context, and constraining the objective so the model does not systematically disadvantage protected groups.
  • Robustness by design — ensuring reliable performance under noisy, shifted, or edge-case inputs, including graceful degradation and fallback behavior when confidence is low.
  • Security by design — hardening the model and its pipeline against adversarial manipulation and unauthorized access from the outset.
  • Explainability by design — choosing model classes and adding interpretability tooling so decisions can be explained to the depth the use case and law require (for example, "meaningful information about the logic" under GDPR Article 22).
  • Privacy by design — minimizing personal data, applying privacy-enhancing technologies, and separating training data from direct identifiers.

These properties frequently trade off against one another and against raw accuracy, so the design phase is where the organization records the deliberate balance it has struck and the justification for it. A more explainable model class may sacrifice a few points of accuracy; a more robust model may be more expensive to train; stronger privacy protection may reduce the signal available to learn from. Governance does not demand maximizing every property — it demands making the trade-off consciously, at the right level of authority, and recording why the chosen point is appropriate for the use case's risk level.

Threat Modeling for AI Systems

Secure development requires threat modeling that accounts for AI-specific attack surfaces, not just conventional software vulnerabilities. The team enumerates assets (the model, training data, inference API, and outputs), identifies adversaries and their goals, and maps attacks to mitigations. The dominant AI threat categories every AIGP candidate should know are:

ThreatMechanismIllustrative mitigation
Data poisoningCorrupting training data to implant bias or backdoorsData provenance controls, anomaly detection, curated pipelines
Adversarial examplesPerturbing inputs to force misclassificationAdversarial training, input validation, robustness testing
Model theft / extractionQuerying an API to reconstruct the modelRate limiting, query monitoring, output throttling
Model inversion / membership inferenceRecovering training data or membership from outputsDifferential privacy, output minimization
Prompt injectionMalicious instructions embedded in inputs to an LLMInput/output filtering, privilege separation, instruction hierarchies

Prompt injection deserves special attention for generative and agentic systems: because large language models cannot reliably separate trusted instructions from untrusted content in their context window, an attacker who controls any text the model reads (a web page, a document, an email) may hijack the system's behavior or exfiltrate data. Defenses are layered — content filtering, least-privilege tool access, and human confirmation for consequential actions — because no single control is sufficient.

Human-Oversight and Secure-Development Design

Human oversight is a design property, not a policy afterthought. The EU AI Act (Article 14) requires that high-risk systems be designed so natural persons can effectively oversee them — understanding the system's capacities and limits, staying alert to automation bias, correctly interpreting output, deciding not to use the output, and intervening or halting the system. Designers choose the oversight model: human-in-the-loop (a person approves each decision), human-on-the-loop (a person monitors and can intervene), or human-in-command (overall control and the ability to disable). The right model follows from the impact assessment — the higher the stakes and the lower the reversibility, the more direct the human control should be. A "stop button," override paths, and confidence-based escalation are concrete artifacts of oversight by design.

Privacy-enhancing technologies (PETs) are a core part of design-stage control selection, and candidates should be able to match a technique to the risk it addresses. Differential privacy adds calibrated noise so that no single individual's data measurably changes the output, defending against membership-inference and inversion attacks. Federated learning trains across decentralized data so raw records never leave their source. Anonymization and pseudonymization remove or replace direct identifiers, and synthetic data substitutes statistically similar artificial records for real ones. Homomorphic encryption and secure multi-party computation allow computation over encrypted data. Each PET trades some accuracy, cost, or complexity for privacy, so the design phase records which techniques were chosen and why, tying the choice back to the impact assessment's findings.

Finally, secure development practices wrap the whole effort: version control for code, data, and models; reproducible training pipelines; signed and tracked model artifacts; access controls and secrets management; dependency and supply-chain scanning (including third-party foundation models and open-source components); and thorough documentation of design decisions. Standards such as ISO/IEC 42001 (AI management systems) and the NIST AI RMF's Map and Manage functions give organizations a recognized structure for embedding these controls. The output of a well-governed design phase is not just working code but an auditable record showing which risks were identified, which controls were chosen, and why — the evidence base a conformity assessment or regulator will later demand.

Test Your Knowledge

A security team is threat modeling an LLM application that summarizes user-uploaded documents. An attacker embeds hidden text in a document instructing the model to ignore its guidelines and reveal system data. Which threat is this?

A
B
C
D
Test Your Knowledge

A team must choose a human-oversight model for a high-stakes, low-reversibility medical triage recommendation system. Which design is most appropriate under an EU AI Act Article 14 analysis?

A
B
C
D