7.4 Validity Threats and Causal Inference

Key Takeaways

  • Internal validity concerns whether the intervention, not an alternative, caused the observed effect.
  • External validity concerns whether findings generalize across people, settings, times, and procedures.
  • Construct validity concerns whether the operationalization captures the intended psychological construct.
  • Statistical conclusion validity concerns whether the statistical inference is accurate given power, assumptions, and error rate.
Last updated: June 2026

Identify Which Inference Is Under Attack

Validity in research is not one issue but four. A study can have strong internal validity and weak external validity, or strong analysis and poor construct measurement. EPPP items typically describe a flaw and ask for the name of the threat. The fastest route is to ask which inference the flaw damages: causation (internal), generalization (external), construct meaning (construct), or statistical accuracy (statistical conclusion).

Internal validity asks whether the independent variable caused the effect. The classic Campbell-and-Stanley threats:

ThreatWhat it weakensExample cue
HistoryCausal inferenceAn outside event occurs between pre- and posttest
MaturationCausal inferenceParticipants change naturally over time
TestingCausal inferenceTaking the pretest alters posttest performance
InstrumentationMeasurement comparabilityThe measure, rater, or scoring procedure shifts mid-study
Regression to the meanCausal inferenceExtreme-scoring groups drift toward average on retest
SelectionGroup comparabilityIntact groups differ before the intervention
Attrition (mortality)Group comparabilityDifferential dropout across conditions
Diffusion of treatmentCausal inferenceControl group is exposed to the intervention

Regression to the mean is heavily tested. When participants are selected for extreme initial scores (e.g., the most distressed), some apparent improvement on retest is statistical artifact, not treatment effect, because extreme scores tend to move toward the average. A single-group pre-post design with an extreme-scoring sample is especially vulnerable, which is why a control group is essential.

External, Construct, and Statistical Conclusion Validity

Selection threats arise when intact groups differ before intervention, common in quasi-experiments. If one clinic delivers a new treatment and another delivers usual care, clinic differences (staff, clientele, resources) may explain outcomes. Matching and statistical control help but never fully substitute for random assignment.

External validity asks whether findings travel across populations, settings, times, and measures. A therapy trial run on highly selected university-clinic adults may not generalize to adolescents, older adults, rural clients, court-referred clients, or those with complex comorbidity. The correct EPPP answer often preserves the finding while limiting the population to which it applies. Threats include interaction of selection with treatment, reactive arrangements (the Hawthorne effect), and testing-by-treatment interactions.

Construct validity (of the cause/effect) asks whether the study measured or manipulated what it claimed. If "social support" is operationalized only as number of social-media contacts, the definition misses quality, availability, reciprocity, and perceived support; the problem is conceptual, not statistical. Mono-operation bias (one measure of the construct) and experimenter expectancy are construct threats.

Statistical conclusion validity asks whether the analysis supports the inference. Threats include:

  • Low power — too small a sample misses a real effect (raises Type II error).
  • Inflated Type I error — many unplanned comparisons without correction produce false positives.
  • Violated assumptions — non-normality, heteroscedasticity, or non-independence distort tests.
  • Unreliable measurement — attenuates observed relationships.
  • Outliers and restricted range — shrink or exaggerate estimated effects.

When validity options look alike, name the inference in your head. If the issue is whether the treatment caused change, think internal validity. If it is whether the result applies elsewhere, think external. If it is whether the variable represents the construct, think construct. If it is whether the statistical test supports the conclusion, think statistical conclusion validity. The best answer fixes the threat directly rather than adding unrelated study features.

Designing Out the Threats

Knowing a threat is only half the item; the EPPP often asks how to control it. Each threat has a standard remedy, and matching threat to remedy is high-yield:

ThreatStandard control
SelectionRandom assignment; if impossible, matching or statistical covariate adjustment
History / maturationA no-treatment or waitlist control group experiencing the same time period
TestingA control group also pretested, or a Solomon four-group design
InstrumentationStandardized, unchanging measures and calibrated, retrained raters
Regression to the meanA control group; avoid selecting on extreme single scores
AttritionTrack and report dropouts; intention-to-treat analysis
Low statistical powerA priori power analysis to set sample size; reliable measures

The Solomon four-group design deserves recognition because it directly isolates a testing effect: two groups are pretested and two are not, and one of each receives treatment, so the researcher can see whether the pretest itself altered the outcome. When a stem describes worry about a pretest sensitizing participants, this design is the targeted fix.

A second exam favorite is the trade-off between internal and external validity. Tight laboratory control maximizes internal validity but can reduce external validity because the artificial setting differs from real practice; loosely controlled field studies do the reverse. There is no universal winner: the credited answer depends on the study's purpose. An efficacy trial (does it work under ideal conditions?) prioritizes internal validity, whereas an effectiveness trial (does it work in routine care?) prioritizes external validity. The EPPP wants candidates to recognize this tension rather than treat one validity as always supreme.

Finally, distinguish a confound from a simple nuisance variable. A confound varies systematically with the independent variable and offers a rival explanation (e.g., the treatment group also got more therapist contact time). A nuisance variable adds random error but does not bias the comparison. Controlling, holding constant, or randomizing a confound is essential; randomization is powerful precisely because it distributes unknown confounds roughly evenly across conditions.

When two options both name plausible threats, the stronger answer identifies the one that systematically differs with the conditions, because that is the threat capable of masquerading as a treatment effect.

Test Your Knowledge

A wellness program enrolls only the most distressed employees and reports symptom improvement at posttest with no control group. Which threat most undermines the causal claim?

A
B
C
D
Test Your Knowledge

A treatment study used highly selected university-clinic clients, and the question asks whether results apply to rural community clinics. Which validity is most central?

A
B
C
D
Test Your Knowledge

Investigators ran 20 uncorrected pairwise comparisons and reported the two that were 'significant.' Which validity is threatened?

A
B
C
D