Which statement best distinguishes reliability from validity?

Reliability concerns consistency, while validity concerns whether score interpretations and uses are supported.. Reliability is about consistency of measurement. Validity is about whether evidence supports the intended interpretation and use of the scores.

A measure has high internal consistency but its items do not cover the construct being assessed. What is the main concern?

The measure may be reliable but lack adequate validity evidence for the intended use.. High consistency does not guarantee that the items represent the intended construct or support the proposed interpretation.

What does sensitivity describe in a classification measure?

The ability to detect true cases.. Sensitivity refers to how well a measure identifies people who truly have the condition or characteristic being screened.

Measurement, Reliability, and Validity — Free Study Guide 2026

Scores Are Evidence, Not Magic

Psychological research depends on measurement. If the measure is weak, even an elegant design can produce a shaky conclusion. On the EPPP, measurement questions often ask whether a test, rating scale, observation system, interview code, or outcome measure is consistent enough and meaningful enough for the proposed use. The safest reasoning is to link the score to the decision being made.

Reliability refers to consistency. Test-retest reliability concerns stability over time. Interrater reliability concerns agreement among observers or coders. Internal consistency concerns whether items on a scale are measuring related content. Alternate-forms reliability concerns whether different versions produce comparable scores. A reliability coefficient is not a moral rating of a test; it is evidence about consistency under specified conditions.

Measurement concept	What it asks	Common EPPP cue
Test-retest reliability	Are scores stable across time when the construct should be stable?	Re-administering a measure after a short interval.
Interrater reliability	Do observers score the same behavior similarly?	Multiple clinicians code recorded sessions.
Internal consistency	Do items on a scale hang together?	Items are intended to assess one construct.
Content validity evidence	Does the measure cover the domain adequately?	Subject matter experts review item coverage.
Criterion-related evidence	Does the score relate to an outcome or criterion?	Scores predict later functioning or correlate with an established measure.

Validity concerns whether evidence and theory support the interpretation and use of scores. A test does not have one permanent validity status for every setting. A depression screener may be valid for initial symptom screening in one population but insufficient for diagnosis, disability determination, or high-stakes forensic conclusions. The EPPP often rewards the answer that asks whether the test was validated for the population and purpose in the vignette.

Construct validity evidence asks whether a measure behaves as expected if it truly reflects the construct. Convergent evidence means the measure relates to similar constructs or established instruments. Discriminant evidence means it does not relate too strongly to different constructs. Criterion-related evidence can be predictive, when the score forecasts a later outcome, or concurrent, when it relates to a present criterion.

Reliability is necessary but not sufficient for validity. A bathroom scale that is always five pounds off is consistent but inaccurate for actual weight. In psychology, a highly consistent measure can still fail if the items do not represent the construct, the language is inappropriate for the client group, or the score is used for a decision beyond the validation evidence.

Measurement also includes sensitivity and specificity for classification. Sensitivity is the ability to detect true cases. Specificity is the ability to identify non-cases. A screening tool usually values sensitivity because missing a serious condition can be costly. A confirmatory decision may require more specificity to avoid false positives. Base rates also matter: when a condition is rare, false positives can become a larger practical issue.

For exam questions, slow down when an answer says valid without saying valid for what. The stronger option identifies the intended construct, population, decision, and evidence. That habit aligns research methods with assessment ethics and clinical judgment.

EPPP Study Guide

7.3 Measurement, Reliability, and Validity

Key Takeaways

Scores Are Evidence, Not Magic

EPPP Study Guide

1Orientation: EPPP Two-Part Exam, Eligibility, Fees, Authorization, Scoring, and Retakes

2Part 1 and Part 2 Domain Map, Pretest Items, Pacing, and Study Strategy

3Biological Bases and Cognitive-Affective Bases of Behavior

4Social/Cultural Bases and Growth/Lifespan Development

5Assessment and Diagnosis: Psychometrics, Differential Diagnosis, and Communication

6Treatment, Intervention, Prevention, Consultation, and Supervision Knowledge

7Research Methods, Statistics, and Evidence-Based Practice

8Ethical, Legal, and Professional Issues for Part 1

9Part 2 Skills I: Scientific Orientation, Assessment, and Intervention

10Part 2 Skills II: Relational Competence, Professionalism, and Ethical Practice

11Part 2 Skills III: Collaboration, Consultation, Supervision, and Systems Practice

12Final EPPP Review: Test Day, Results, Score Transfer, and Licensure Next Steps

7.3 Measurement, Reliability, and Validity

Key Takeaways

Scores Are Evidence, Not Magic