A customer appears twice in the system under slightly different names, splitting their activity below monitoring thresholds so no alert fires. Which control most directly fixes this?

Entity resolution and deduplication into a single golden record. The root cause is a data uniqueness failure that fragments one customer into two records. Resolving the duplicates into a single golden record aggregates the activity so existing controls see the true volume. New rules or lower thresholds do not address the underlying data defect.

Tuning fuzzy-matching too tightly in a sanctions filter most directly increases the risk of which outcome?

False negatives — a sanctioned party slipping through unmatched. Matching set too tight rejects near-matches and lets a sanctioned name pass without alerting — a false negative. Set too loose, it produces excessive false positives. Tuning is a governed, documented decision balancing the two error types.

Why is least-privilege access especially important for SAR and investigation data?

Over-broad access risks tipping off subjects and breaches confidentiality obligations. SAR and investigation files are highly confidential. Uncontrolled access can lead to tipping off a subject under investigation and violates legal confidentiality duties, so access is restricted on a need-to-know, least-privilege basis.

Data Quality, Integrity, Access, and Taxonom | Free Guide 2026

Data Quality, Integrity, Access, and Taxonomy

Every sanctions filter, transaction-monitoring scenario, and risk model consumes data. If that data is wrong, incomplete, or fragmented, the control fails silently — the system reports 'clear' while the risk is real. CAMS frames this as the principle that data quality is a control, not a back-office hygiene task.

The six data-quality dimensions

Dimension	Definition	AML failure if missing
Completeness	All required fields populated	Beneficial owner blank → screening gap
Accuracy	Values reflect reality	Wrong date of birth → sanctions match missed
Consistency	Same value across systems	Two spellings of one name → fragmented monitoring
Timeliness	Data current and refreshed	Stale address → outdated geographic risk
Uniqueness	One record per real entity	Duplicate customers → activity split below thresholds
Validity	Conforms to format/rules	Free-text country field → screening can't parse

Worked example. A customer is onboarded twice — once as 'Maria Garcia' and once as 'M. Garcia-Lopez' — because of a uniqueness failure. Each record shows $4,000 monthly wires, below a $5,000 monitoring threshold, so no alert fires. Aggregated, the real customer moves $8,000 monthly: classic structuring hidden by poor data. The control did not fail; the data did. The exam answer is entity resolution / deduplication into a golden record, not a new monitoring rule.

Taxonomy and the golden record

A data taxonomy is the agreed classification of customers, products, jurisdictions, and risk categories. Without it, one team's 'PEP' is another's 'high-profile client', and reporting cannot roll up. The golden record (master Customer Information File) is the single, reconciled source of truth that screening and monitoring read from. Reference data — sanctions lists, country-risk ratings, high-risk-business codes — must be versioned so an alert can be reproduced against the list as it stood that day.

Integrity, access, and lineage

Integrity: data must not be altered improperly; changes are logged with who/when/why.
Access control: apply least-privilege and need-to-know. SAR data and investigation files are highly restricted — over-broad access risks tipping off and breaches of confidentiality.
Data lineage: the traceable path from source system → transformation → alert. Examiners and auditors expect you to reproduce why an alert fired and prove the underlying data.
Retention: records (CDD, transactions, SARs) are generally retained for five years under most BSA/FATF regimes; the exam expects you to know that retention enables reconstruction of transactions.

Common traps

First, do not confuse false negatives (a real risk the data hid) with false positives (noise from over-matching). Poor data drives both. Second, fuzzy matching tuned too loose floods investigators; tuned too tight misses sanctioned parties — tuning is a documented, governed decision. Third, adding more rules to a system fed by bad data multiplies the noise; the disciplined fix is upstream data remediation. Fourth, broad data access is not 'efficiency' — uncontrolled access to SAR and investigation data is a reportable confidentiality failure.

The CAMS-correct answer treats data as a governed asset with owners, quality metrics, lineage, and least-privilege access.

Data governance roles and metrics

Good data does not happen by accident; it is governed. A data owner (a business executive) is accountable for the meaning and quality of a data domain, while a data steward operationally maintains it. Data quality metrics — completeness rates, match rates, exception counts — are tracked and reported like any other control metric, with thresholds and escalation. When a regulator asks 'how do you know your customer data is accurate?' the credible answer is a measured, governed program, not an assertion. CAMS scenario items reward identifying the owner and the metric, not just naming the defect.

Reference data and reproducibility

Screening and monitoring depend on reference data: sanctions lists, PEP databases, high-risk country ratings, and high-risk-business codes. This data changes constantly, so it must be versioned and dated. If an alert fired on 1 March against the OFAC list as it stood that day, an investigator must be able to reproduce the match against that exact list version months later. Loading a new list silently over the old one destroys reproducibility and undermines an examiner's ability to test the control — a frequently overlooked defect.

Structured versus unstructured data

Monitoring systems handle structured data (amounts, dates, account numbers in fixed fields) well, but much AML-relevant information arrives as unstructured data — free-text payment messages (the SWIFT MT103 remittance field), adverse-media articles, and scanned documents. A payment whose structured fields look benign may carry a sanctioned port name or a shell-company reference buried in free text. Controls increasingly parse unstructured fields, but free-text entry at onboarding (a country typed instead of selected from a list) is a validity defect that defeats automated screening.

A worked data-lineage scenario

An examiner asks why a $50,000 cash deposit never alerted. Investigation shows the branch system recorded it as a check, not cash, so the cash-structuring scenario never evaluated it. The defect is accuracy at the source, and the fix is upstream input controls plus reconciliation — not a new scenario. Data lineage is what lets the institution trace the alert (or its absence) back to the originating field and prove the root cause.

The exam-correct conclusion: when controls miss, interrogate the data feeding them before adding rules, because new rules built on bad data simply multiply false positives while leaving the true risk invisible.

CAMS Study Guide

CAMS

Data Quality, Integrity, Access, and Taxonomy

Key Takeaways

Data Quality, Integrity, Access, and Taxonomy

The six data-quality dimensions

Taxonomy and the golden record

Integrity, access, and lineage

Common traps

Data governance roles and metrics

Reference data and reproducibility

Structured versus unstructured data

A worked data-lineage scenario

CAMS Study Guide

1Chapter 1: CAMS Orientation and Exam Facts

2Chapter 2: Financial Crime Risks and Methods

3Chapter 3: High-Risk Sectors and Structures

4Chapter 4: Global AFC Frameworks

5Chapter 5: Governance Regulations and Information Sharing

6Chapter 6: AFC Compliance Program Design

7Chapter 7: Program Governance and Operating Model

8Chapter 8: Customer Lifecycle Controls

9Chapter 9: Ongoing Controls and Monitoring

10Chapter 10: Investigations and Case Support

11Chapter 11: Technology and Emerging AFC Practice

12Chapter 12: Final CAMS Review and Exam Day

CAMS

Data Quality, Integrity, Access, and Taxonomy

Key Takeaways

Data Quality, Integrity, Access, and Taxonomy

The six data-quality dimensions

Taxonomy and the golden record

Integrity, access, and lineage

Common traps

Data governance roles and metrics

Reference data and reproducibility

Structured versus unstructured data

A worked data-lineage scenario