Internal/External Validity and Threats

Key Takeaways

  • Internal validity concerns whether the IV — not a confound — caused the behavior change; it is about defensible causal claims for THIS case.
  • External validity concerns whether findings generalize across people, settings, behaviors, materials, and times; it is built through replication, not asserted.
  • Classic threats include history, maturation, testing, instrumentation, procedural drift, and multiple-treatment interference — memorize a definition and a design remedy for each.
  • Single-case designs counter threats with repeated measurement, staggered replications, reversals, and crisp condition changes.
  • A design can have strong internal validity for one client yet narrow external validity until the effect is replicated.
Last updated: June 2026

Two Different Questions: Cause vs. Reach

Internal validity asks whether the IV is the most plausible reason behavior changed in this study. High internal validity means rival explanations have been ruled out and you can defend a causal claim. External validity asks whether the same effect would hold under other relevant conditions — different participants, settings, behaviors, materials, implementers, or times.

These trade off and must be kept separate. A tightly controlled reversal in one clinic room maximizes internal validity but says nothing, by itself, about generality. Conversely, a sprawling field demonstration across many sites with weak control may look general but cannot establish that the IV caused anything.

The BCBA exam repeatedly tests whether you can separate a strong functional relation (internal validity) from a weak generality claim (external validity). When a stem says a treatment 'reduced behavior for this client,' that is an internal-validity statement; do not over-extend it to all clients.

A useful frame: internal validity is the inward question (did the IV cause change here?), while external validity is the outward question (does this travel elsewhere?). Strong single-case designs prioritize the inward question first, because a result that cannot be trusted internally has nothing worth generalizing. Only after internal validity is secured does replication earn the right to widen the claim outward.

The Classic Threats to Internal Validity

Threats are events other than the IV that could produce the observed change. Learn a one-line definition and a design remedy for each.

ThreatWhat it meansDesign response
HistoryAn outside event coincides with treatmentStagger starts (multiple baseline); replicate the effect
MaturationBehavior changes from time, growth, or fatigueEstablish baseline trend; replicate across tiers
TestingRepeated exposure to measurement changes performanceUse unobtrusive measures; careful probing
InstrumentationThe measurement system itself drifts or changesTrain observers; monitor IOA and definitions
Procedural driftImplementers deliver the IV differently over timeCollect procedural integrity data; retrain
Multiple-treatment interferenceOne condition carries over and affects anotherCounterbalance; choose a different design; insert washout

History and maturation are the most frequently confused pair. History is a specific external event (a new med, a holiday, a new aide). Maturation is a gradual internal process (development, learning over time, getting tired within a session). If the stem names a concrete event, think history; if it describes the passage of time or growth, think maturation.

Two More Threats: Multiple-Treatment Interference and Procedural Drift

Beyond the classic six, two threats appear constantly in ABA scenarios. Multiple-treatment interference (carryover) occurs when exposure to one condition alters responding in a subsequent condition — common in multielement and rapidly alternating arrangements. If a participant just experienced a rich reinforcement condition, behavior in the next condition may be temporarily 'contaminated.' Remedies include counterbalancing the order, separating conditions with discriminative signals, or choosing a design that does not alternate so quickly.

Procedural drift (treatment-integrity failure) occurs when implementers gradually deliver the IV differently over time — a paraprofessional who slowly stops running the full protocol. The danger is that a 'no effect' result may reflect a treatment that was never truly in place, not a treatment that failed. The remedy is to measure procedural integrity directly and retrain. Note the symmetry with measurement: IOA protects the integrity of the DV, while procedural integrity protects the integrity of the IV. Confusing the two is a classic distractor.

How Single-Case Designs Defeat Threats

Single-case designs support internal validity by producing a pattern that matches the planned manipulation. A sudden improvement right after treatment is promising, but it is weak evidence if a school vacation, a medication change, or a staff turnover occurred at the same moment. Replication at planned, staggered times is what makes coincidence implausible — if behavior changes three separate times, each time the IV is applied, history would have to coincide three times.

External validity is built, not declared, through two kinds of replication:

  • Direct replication repeats the same procedures with similar participants and conditions to confirm reliability of the effect.
  • Systematic replication deliberately varies something — a new setting, implementer, behavior, or learner — to test how far the effect extends. Successful systematic replications progressively widen the generality claim.

When an item asks for the biggest threat, scan for an event that coincides with the condition change — that is almost always the answer. When it asks for the best design improvement, choose the option that adds prediction, verification, or replication while remaining ethical and feasible. Avoid options that 'prove' generality from a single case; that confuses internal and external validity.

A reversal with one client may convincingly show a treatment reduced disruption for that client, but it does not prove the treatment will work for all clients, all settings, or all functions of behavior. Generality is also one face of social validity: a result is most useful when it holds in the settings and with the people the client actually lives and works among, run by the staff who will actually deliver it.

Test Your Knowledge

A teacher reports that a student's tantrums dropped sharply right after a behavior plan started. On review, the analyst learns the student also moved to a new, much smaller class the same week. What is the most serious threat to internal validity?

A
B
C
D
Test Your Knowledge

An analyst demonstrates a strong functional relation in a reversal design with one client in one clinic. A reviewer asks whether the result will hold for other clients in school settings. This question is about:

A
B
C
D
Test Your Knowledge

Over several weeks, two observers gradually relax how strictly they apply the definition of 'aggression,' counting more borderline events as aggression. What threat does this create?

A
B
C
D