A wellness program enrolls only the most distressed employees and reports symptom improvement at posttest with no control group. Which threat most undermines the causal claim?

Regression to the mean. Selecting participants for extreme initial scores invites regression to the mean: extreme scores naturally drift toward average on retest, mimicking improvement absent a treatment effect, and a control group is needed to rule it out.

A treatment study used highly selected university-clinic clients, and the question asks whether results apply to rural community clinics. Which validity is most central?

External validity. Whether a finding generalizes to a different population and setting is the definition of external validity.

Investigators ran 20 uncorrected pairwise comparisons and reported the two that were 'significant.' Which validity is threatened?

Statistical conclusion validity, through inflated Type I error from multiple comparisons.. Running many unplanned comparisons without correction inflates the family-wise Type I error rate, a statistical conclusion validity threat; the 'significant' findings may be false positives.

Validity Threats and Causal Inference — Free Study Guide 2026

Identify Which Inference Is Under Attack

Validity in research is not one issue but four. A study can have strong internal validity and weak external validity, or strong analysis and poor construct measurement. EPPP items typically describe a flaw and ask for the name of the threat. The fastest route is to ask which inference the flaw damages: causation (internal), generalization (external), construct meaning (construct), or statistical accuracy (statistical conclusion).

Internal validity asks whether the independent variable caused the effect. The classic Campbell-and-Stanley threats:

Threat	What it weakens	Example cue
History	Causal inference	An outside event occurs between pre- and posttest
Maturation	Causal inference	Participants change naturally over time
Testing	Causal inference	Taking the pretest alters posttest performance
Instrumentation	Measurement comparability	The measure, rater, or scoring procedure shifts mid-study
Regression to the mean	Causal inference	Extreme-scoring groups drift toward average on retest
Selection	Group comparability	Intact groups differ before the intervention
Attrition (mortality)	Group comparability	Differential dropout across conditions
Diffusion of treatment	Causal inference	Control group is exposed to the intervention

Regression to the mean is heavily tested. When participants are selected for extreme initial scores (e.g., the most distressed), some apparent improvement on retest is statistical artifact, not treatment effect, because extreme scores tend to move toward the average. A single-group pre-post design with an extreme-scoring sample is especially vulnerable, which is why a control group is essential.

External, Construct, and Statistical Conclusion Validity

Selection threats arise when intact groups differ before intervention, common in quasi-experiments. If one clinic delivers a new treatment and another delivers usual care, clinic differences (staff, clientele, resources) may explain outcomes. Matching and statistical control help but never fully substitute for random assignment.

External validity asks whether findings travel across populations, settings, times, and measures. A therapy trial run on highly selected university-clinic adults may not generalize to adolescents, older adults, rural clients, court-referred clients, or those with complex comorbidity. The correct EPPP answer often preserves the finding while limiting the population to which it applies. Threats include interaction of selection with treatment, reactive arrangements (the Hawthorne effect), and testing-by-treatment interactions.

Construct validity (of the cause/effect) asks whether the study measured or manipulated what it claimed. If "social support" is operationalized only as number of social-media contacts, the definition misses quality, availability, reciprocity, and perceived support; the problem is conceptual, not statistical. Mono-operation bias (one measure of the construct) and experimenter expectancy are construct threats.

Statistical conclusion validity asks whether the analysis supports the inference. Threats include:

Low power — too small a sample misses a real effect (raises Type II error).
Inflated Type I error — many unplanned comparisons without correction produce false positives.
Violated assumptions — non-normality, heteroscedasticity, or non-independence distort tests.
Unreliable measurement — attenuates observed relationships.
Outliers and restricted range — shrink or exaggerate estimated effects.

When validity options look alike, name the inference in your head. If the issue is whether the treatment caused change, think internal validity. If it is whether the result applies elsewhere, think external. If it is whether the variable represents the construct, think construct. If it is whether the statistical test supports the conclusion, think statistical conclusion validity. The best answer fixes the threat directly rather than adding unrelated study features.

Designing Out the Threats

Knowing a threat is only half the item; the EPPP often asks how to control it. Each threat has a standard remedy, and matching threat to remedy is high-yield:

Threat	Standard control
Selection	Random assignment; if impossible, matching or statistical covariate adjustment
History / maturation	A no-treatment or waitlist control group experiencing the same time period
Testing	A control group also pretested, or a Solomon four-group design
Instrumentation	Standardized, unchanging measures and calibrated, retrained raters
Regression to the mean	A control group; avoid selecting on extreme single scores
Attrition	Track and report dropouts; intention-to-treat analysis
Low statistical power	A priori power analysis to set sample size; reliable measures

The Solomon four-group design deserves recognition because it directly isolates a testing effect: two groups are pretested and two are not, and one of each receives treatment, so the researcher can see whether the pretest itself altered the outcome. When a stem describes worry about a pretest sensitizing participants, this design is the targeted fix.

A second exam favorite is the trade-off between internal and external validity. Tight laboratory control maximizes internal validity but can reduce external validity because the artificial setting differs from real practice; loosely controlled field studies do the reverse. There is no universal winner: the credited answer depends on the study's purpose. An efficacy trial (does it work under ideal conditions?) prioritizes internal validity, whereas an effectiveness trial (does it work in routine care?) prioritizes external validity. The EPPP wants candidates to recognize this tension rather than treat one validity as always supreme.

Finally, distinguish a confound from a simple nuisance variable. A confound varies systematically with the independent variable and offers a rival explanation (e.g., the treatment group also got more therapist contact time). A nuisance variable adds random error but does not bias the comparison. Controlling, holding constant, or randomizing a confound is essential; randomization is powerful precisely because it distributes unknown confounds roughly evenly across conditions.

When two options both name plausible threats, the stronger answer identifies the one that systematically differs with the conditions, because that is the threat capable of masquerading as a treatment effect.

EPPP Study Guide

EPPP

7.4 Validity Threats and Causal Inference

Key Takeaways

Identify Which Inference Is Under Attack

External, Construct, and Statistical Conclusion Validity

Designing Out the Threats

EPPP Study Guide

1Orientation: EPPP Two-Part Exam, Eligibility, Fees, Authorization, Scoring, and Retakes

2Part 1 and Part 2 Domain Map, Pretest Items, Pacing, and Study Strategy

3Biological Bases and Cognitive-Affective Bases of Behavior

4Social/Cultural Bases and Growth/Lifespan Development

5Assessment and Diagnosis: Psychometrics, Differential Diagnosis, and Communication

6Treatment, Intervention, Prevention, Consultation, and Supervision Knowledge

7Research Methods, Statistics, and Evidence-Based Practice

8Ethical, Legal, and Professional Issues for Part 1

9Part 2 Skills I: Scientific Orientation, Assessment, and Intervention

10Part 2 Skills II: Relational Competence, Professionalism, and Ethical Practice

11Part 2 Skills III: Collaboration, Consultation, Supervision, and Systems Practice

12Final EPPP Review: Test Day, Results, Score Transfer, and Licensure Next Steps

EPPP

7.4 Validity Threats and Causal Inference

Key Takeaways

Identify Which Inference Is Under Attack

External, Construct, and Statistical Conclusion Validity

Designing Out the Threats