Model Risk Management and Tuning

Key Takeaways

  • Model risk management governs the lifecycle of monitoring/screening models: development, validation, ongoing monitoring, and independent review.
  • The U.S. benchmark is the Fed/OCC SR 11-7 framework: effective challenge, independent validation, and documentation of model purpose and limitations.
  • Above-the-line/below-the-line (ATL/BTL) testing samples alerts near thresholds to tune for missed activity and excess false positives.
  • AI/ML monitoring models add explainability, bias, and data-quality risks; output must be governed, validated, and explainable to regulators.
Last updated: June 2026

Model Risk Management and Tuning

Every monitoring scenario, screening filter, and risk-rating engine is a model — a quantitative method that turns inputs into outputs used for decisions. Model risk is the risk of adverse outcomes from errors in a model's design, data, or use. Model risk management (MRM) is the governance discipline that keeps these models reliable, documented, and defensible to regulators. On CAMS, this is where technology meets accountability: a screening filter that misses true hits, or a monitoring scenario that floods analysts, is a model-risk failure, not just a technical glitch.

The SR 11-7 framework

The foundational U.S. standard is the Federal Reserve and OCC's Supervisory Guidance on Model Risk Management (SR 11-7 / OCC 2011-12). Its core principles apply directly to AML models:

PrincipleWhat it requires
Effective challengeCritical, independent review by parties with authority and competence
Independent validationValidation by a function separate from model development
DocumentationClear record of model purpose, design, assumptions, and limitations
Ongoing monitoringContinuous checks that the model still performs as intended

A model owned, built, tuned, and validated all by the same analyst fails the independence and effective-challenge tests — a frequent exam-wrong setup.

Tuning: above-the-line and below-the-line testing

Tuning sets and adjusts scenario thresholds so the model is neither too noisy nor too blind. The standard technique is above-the-line/below-the-line (ATL/BTL) testing:

  • Above-the-line (ATL): sample alerts just above the threshold to check whether they are mostly false positives — if so, the threshold may be too low.
  • Below-the-line (BTL): sample activity just below the threshold that did not alert, to check whether genuinely suspicious activity is being missed — if so, the threshold is too high.

BTL testing is the safeguard against silent false negatives. Raising a threshold to clear a backlog without BTL testing is a classic governance failure: the backlog disappears, but real suspicious activity may now go undetected.

Validation and ongoing monitoring

Validation covers conceptual soundness, outcome analysis (back-testing against known cases), and benchmarking. Models must be revalidated periodically and after material changes — new products, new typologies, new data feeds, or regulatory changes. Data quality is foundational: a model fed incomplete or mis-mapped data produces unreliable output regardless of how sound its logic is ("garbage in, garbage out").

AI and machine-learning models

Machine-learning monitoring promises better detection and fewer false positives, but adds risks: explainability (regulators and investigators must understand why an alert fired), bias (training data can embed discriminatory or skewed patterns), and drift (performance degrades as behavior changes). The governance answer is not to ban AI but to subject it to the same MRM discipline — documented purpose, independent validation, explainable outputs, and ongoing monitoring — with extra attention to data lineage and human oversight of automated decisions.

Worked scenario

An institution faces a large alert backlog and proposes raising several monitoring thresholds to cut volume. The correct CAMS-aligned response: do not simply raise thresholds. Run below-the-line testing on the activity that would no longer alert to confirm no suspicious patterns are being suppressed, document the analysis and rationale, obtain independent validation and governance approval, and only then implement — retaining records for the regulator.

The three lines of defense

MRM accountability is usually framed through the three lines of defense, which CAMS tests directly. The first line owns and operates the models (the business and the AML operations team that builds and runs scenarios). The second line is independent risk and compliance, which sets standards, challenges the first line, and oversees validation. The third line is internal audit, which independently assures that the whole framework is working. Effective challenge requires real separation: validation cannot be performed by the same people who built the model, and audit must be independent of both.

A scenario where the model developer also signs off the validation collapses the lines and is a governance failure.

Inventory, change control, and the model lifecycle

A sound program maintains a model inventory — a complete register of every model, its owner, purpose, data sources, validation status, and last review date. Examiners frequently ask for it. Models move through a lifecycle of development, independent validation before deployment, ongoing performance monitoring, periodic revalidation, and decommissioning. Change control governs every adjustment: a threshold change, a new scenario, a data-feed swap, or a vendor model upgrade must be documented, tested (including below-the-line analysis), validated, and approved before going live.

Undocumented or untested changes are among the most common regulatory findings in monitoring and screening systems.

Vendor and tuning documentation

Many institutions buy monitoring and screening engines from vendors. Outsourcing the software does not outsource the responsibility: the institution must still understand the model logic, validate its performance against its own risk profile, calibrate thresholds to its own data, and document why settings were chosen. "The vendor set the default threshold" is not an acceptable answer to an examiner. Every tuning decision and validation result is retained as evidence that the technology is fit for purpose.

Common traps

  • One person building, tuning, and validating a model — no effective challenge.
  • Raising thresholds to reduce backlog without below-the-line testing.
  • Accepting vendor default thresholds without calibration to the institution's own risk.
  • Treating AI/ML output as inherently correct without validation or explainability.
  • Ignoring data-quality issues that undermine otherwise sound model logic.
Test Your Knowledge

Facing a large alert backlog, an institution proposes raising several transaction-monitoring thresholds to reduce volume. Under sound model risk management, what must happen before the change is implemented?

A
B
C
D
Test Your Knowledge

An AML team deploys a machine-learning monitoring model that reduces false positives but cannot easily explain why individual alerts fire. From a model risk management perspective, what is the main concern?

A
B
C
D