2.4 Master Patient Index and Data Quality

Key Takeaways

  • The MPI links every encounter to one enterprise patient identifier; an EMPI links identities across multiple facilities/systems
  • Duplicates (one patient, two records), overlays (one record, two patients), and overlaps (linked-system mismatches) are the three core MPI integrity errors
  • AHIMA's Data Quality Management Model defines 10 characteristics: accuracy, accessibility, comprehensiveness, consistency, currency, definition, granularity, precision, relevancy, timeliness
  • An overlay is the most dangerous MPI error because it mixes two patients' clinical data in one record, creating a patient-safety hazard
Last updated: June 2026

The MPI and the EMPI

The master patient index (MPI) is the permanent directory that links every patient to a single medical record number (MRN) and cross-references all of that patient's encounters. It is the backbone of record retrieval; if the MPI is wrong, the right chart cannot be found. Core MPI data elements include name, date of birth, sex, MRN, and an internal enterprise identifier.

An enterprise master patient index (EMPI) links a patient's identities across multiple facilities or systems — essential when hospitals merge or share a health information exchange. Because the U.S. has no national patient identifier, the EMPI relies on probabilistic matching algorithms that score demographic similarity to decide whether two records belong to the same person.

MPI Integrity Errors

Three error types are heavily tested — keep them straight:

ErrorDefinitionRisk
DuplicateOne patient has two or more MRNs/recordsFragmented care; data scattered
OverlayOne record holds data for two different patientsPatient-safety hazard — clinical info commingled
OverlapA patient has records in two linked systems that don't reconcileMismatched identity across the EMPI

The overlay is the most dangerous because one chart now contains another person's allergies, medications, and results — a direct safety risk. Record matching uses deterministic (exact-match) or probabilistic (weighted-score) algorithms; the duplicate rate is a key MPI quality metric, and best practice is to prevent duplicates at registration rather than clean them up later.

AHIMA's Data Quality Management Model

AHIMA's Data Quality Management (DQM) Model defines 10 characteristics of high-quality data. RHIT items ask you to match a scenario to the right characteristic:

  1. Accuracy — data reflects the true value/event (correct, error-free).
  2. Accessibility — data is available when and where needed.
  3. Comprehensiveness — all required data is present (completeness).
  4. Consistency — the same value/meaning across systems and time (reliability).
  5. Currency — data is up to date, not obsolete.
  6. Definition — clear, documented definitions exist for each element.
  7. Granularity — recorded at the appropriate level of detail.
  8. Precision — values fall within an acceptable, exact range.
  9. Relevancy — data is meaningful for its intended use.
  10. Timeliness — data is recorded and available within the required time.

Data Integrity in Practice

Data integrity means data is complete, accurate, and unaltered from creation through its lifecycle — the umbrella that the 10 DQM characteristics support. Threats include copy-and-paste "note bloat," unverified imported data, and MPI overlays.

Worked example: a clinic discovers two charts for "Robert Smith," each with half his visits — that is a duplicate (a comprehensiveness failure, because no single record is complete). Contrast: a lab result filed under the wrong existing patient is an overlay (an accuracy failure). The DQM model maps the four domains — application, collection, warehousing, and analysis — against these characteristics so quality is enforced at every stage, not just at the end.

Record Matching and MPI Maintenance

Because there is no national patient identifier, matching depends on algorithms:

  • Deterministic matching requires an exact match on chosen fields (e.g., name + DOB + SSN). It is simple but misses records when data is entered slightly differently.
  • Probabilistic matching assigns weights to each field and computes a similarity score; pairs above a threshold are auto-linked, borderline pairs go to a manual review queue. This catches typos and nicknames a deterministic match would miss.

Good registration practices prevent errors at the source: searching the MPI before creating a new record, standardizing name/address formats, and capturing reliable identifiers. When errors occur, HIM resolves duplicates by merging records and corrects overlays by un-merging and re-attaching each entry to the correct patient — a delicate, audited process.

Key metrics: the duplicate rate (AHIMA recommends keeping it low, ideally under ~2%) and the error/overlay rate. The four DQM domains show why this matters end-to-end: errors introduced at collection (registration) poison warehousing and corrupt every downstream analysis. Strong data integrity therefore begins at the front desk, not in the data warehouse, and is sustained by ongoing MPI cleanup and governance.

Naming, Numbering, and Filing Systems

Classic HIM identification methods still surface on the exam. Numbering systems assign the MRN: under unit numbering, a patient keeps one number for life across all encounters (the modern EHR standard); under serial numbering, a new number is issued each visit (fragmenting the record); serial-unit issues a new number but brings prior records forward. Filing methods for paper: straight (consecutive) numeric files in plain numeric order, while terminal-digit filing groups by the last two digits to distribute records evenly across the file room and reduce misfiles — a frequently tested advantage.

Alphabetic filing risks soundalike-name mix-ups, which is why numeric identifiers anchor the MPI. Understanding these systems clarifies how duplicates and overlays arise: a serial system or sloppy registration multiplies identifiers, while a disciplined unit-numbering MPI keeps each patient's data unified and accurate. The exam may ask you to pick terminal-digit filing's chief benefit (even file-room distribution and fewer misfiles) or to identify unit numbering as the method that consolidates a patient's lifetime record under a single identifier — the configuration that minimizes duplicates.

Test Your Knowledge

During an audit, HIM finds that lab results for John Adams were accidentally filed in Jane Allen's existing chart, so her record now contains another patient's data. What MPI integrity error is this?

A
B
C
D
Test Your Knowledge

A registrar creates a brand-new record for a returning patient who already has an MRN, so the patient now has two record numbers. Which data-quality characteristic is most directly compromised?

A
B
C
D
Test Your Knowledge

Two hospitals in a merged system share an EMPI, and the matching algorithm cannot reconcile a patient who exists in both systems with slightly different demographics. This unreconciled cross-system identity is best described as which error?

A
B
C
D