2.4 Master Patient Index and Data Quality
Key Takeaways
- The MPI links every encounter to one enterprise patient identifier; an EMPI links identities across multiple facilities/systems
- Duplicates (one patient, two records), overlays (one record, two patients), and overlaps (linked-system mismatches) are the three core MPI integrity errors
- AHIMA's Data Quality Management Model defines 10 characteristics: accuracy, accessibility, comprehensiveness, consistency, currency, definition, granularity, precision, relevancy, timeliness
- An overlay is the most dangerous MPI error because it mixes two patients' clinical data in one record, creating a patient-safety hazard
The MPI and the EMPI
The master patient index (MPI) is the permanent directory that links every patient to a single medical record number (MRN) and cross-references all of that patient's encounters. It is the backbone of record retrieval; if the MPI is wrong, the right chart cannot be found. Core MPI data elements include name, date of birth, sex, MRN, and an internal enterprise identifier.
An enterprise master patient index (EMPI) links a patient's identities across multiple facilities or systems — essential when hospitals merge or share a health information exchange. Because the U.S. has no national patient identifier, the EMPI relies on probabilistic matching algorithms that score demographic similarity to decide whether two records belong to the same person.
MPI Integrity Errors
Three error types are heavily tested — keep them straight:
| Error | Definition | Risk |
|---|---|---|
| Duplicate | One patient has two or more MRNs/records | Fragmented care; data scattered |
| Overlay | One record holds data for two different patients | Patient-safety hazard — clinical info commingled |
| Overlap | A patient has records in two linked systems that don't reconcile | Mismatched identity across the EMPI |
The overlay is the most dangerous because one chart now contains another person's allergies, medications, and results — a direct safety risk. Record matching uses deterministic (exact-match) or probabilistic (weighted-score) algorithms; the duplicate rate is a key MPI quality metric, and best practice is to prevent duplicates at registration rather than clean them up later.
AHIMA's Data Quality Management Model
AHIMA's Data Quality Management (DQM) Model defines 10 characteristics of high-quality data. RHIT items ask you to match a scenario to the right characteristic:
- Accuracy — data reflects the true value/event (correct, error-free).
- Accessibility — data is available when and where needed.
- Comprehensiveness — all required data is present (completeness).
- Consistency — the same value/meaning across systems and time (reliability).
- Currency — data is up to date, not obsolete.
- Definition — clear, documented definitions exist for each element.
- Granularity — recorded at the appropriate level of detail.
- Precision — values fall within an acceptable, exact range.
- Relevancy — data is meaningful for its intended use.
- Timeliness — data is recorded and available within the required time.
Data Integrity in Practice
Data integrity means data is complete, accurate, and unaltered from creation through its lifecycle — the umbrella that the 10 DQM characteristics support. Threats include copy-and-paste "note bloat," unverified imported data, and MPI overlays.
Worked example: a clinic discovers two charts for "Robert Smith," each with half his visits — that is a duplicate (a comprehensiveness failure, because no single record is complete). Contrast: a lab result filed under the wrong existing patient is an overlay (an accuracy failure). The DQM model maps the four domains — application, collection, warehousing, and analysis — against these characteristics so quality is enforced at every stage, not just at the end.
Record Matching and MPI Maintenance
Because there is no national patient identifier, matching depends on algorithms:
- Deterministic matching requires an exact match on chosen fields (e.g., name + DOB + SSN). It is simple but misses records when data is entered slightly differently.
- Probabilistic matching assigns weights to each field and computes a similarity score; pairs above a threshold are auto-linked, borderline pairs go to a manual review queue. This catches typos and nicknames a deterministic match would miss.
Good registration practices prevent errors at the source: searching the MPI before creating a new record, standardizing name/address formats, and capturing reliable identifiers. When errors occur, HIM resolves duplicates by merging records and corrects overlays by un-merging and re-attaching each entry to the correct patient — a delicate, audited process.
Key metrics: the duplicate rate (AHIMA recommends keeping it low, ideally under ~2%) and the error/overlay rate. The four DQM domains show why this matters end-to-end: errors introduced at collection (registration) poison warehousing and corrupt every downstream analysis. Strong data integrity therefore begins at the front desk, not in the data warehouse, and is sustained by ongoing MPI cleanup and governance.
Naming, Numbering, and Filing Systems
Classic HIM identification methods still surface on the exam. Numbering systems assign the MRN: under unit numbering, a patient keeps one number for life across all encounters (the modern EHR standard); under serial numbering, a new number is issued each visit (fragmenting the record); serial-unit issues a new number but brings prior records forward. Filing methods for paper: straight (consecutive) numeric files in plain numeric order, while terminal-digit filing groups by the last two digits to distribute records evenly across the file room and reduce misfiles — a frequently tested advantage.
Alphabetic filing risks soundalike-name mix-ups, which is why numeric identifiers anchor the MPI. Understanding these systems clarifies how duplicates and overlays arise: a serial system or sloppy registration multiplies identifiers, while a disciplined unit-numbering MPI keeps each patient's data unified and accurate. The exam may ask you to pick terminal-digit filing's chief benefit (even file-room distribution and fewer misfiles) or to identify unit numbering as the method that consolidates a patient's lifetime record under a single identifier — the configuration that minimizes duplicates.
During an audit, HIM finds that lab results for John Adams were accidentally filed in Jane Allen's existing chart, so her record now contains another patient's data. What MPI integrity error is this?
A registrar creates a brand-new record for a returning patient who already has an MRN, so the patient now has two record numbers. Which data-quality characteristic is most directly compromised?
Two hospitals in a merged system share an EMPI, and the matching algorithm cannot reconcile a patient who exists in both systems with slightly different demographics. This unreconciled cross-system identity is best described as which error?