7.2 Designs, Sampling, and Variable Control

Key Takeaways

  • Experimental designs support causal inference when manipulation, control, and random assignment are adequate.
  • Quasi-experimental and correlational designs can be useful but require caution about causal language.
  • Random selection affects external validity; random assignment affects internal validity, and a study can have one without the other.
  • Single-case designs (ABAB reversal, multiple-baseline) show intervention effects through stable baselines, phase changes, and replication.
Last updated: June 2026

Match the Claim to the Design

A research design is a plan for answering a question, and on the EPPP design items test whether the conclusion fits how the data were gathered. If a researcher manipulates an independent variable, controls plausible alternatives, and uses appropriate assignment, a causal inference is stronger. If a researcher only measures variables as they naturally occur, the study describes association or prediction, not causation.

The single most tested distinction is random selection vs. random assignment. Random selection concerns who is drawn from the population and therefore drives generalization (external validity). Random assignment concerns how participants are placed into conditions and therefore drives group comparability (internal validity). A study can have one without the other: a campus experiment may randomly assign volunteers to conditions (good internal validity) yet generalize poorly (weak external validity) because the sample is unrepresentative.

Design featureMain purposeEPPP inference cue
Manipulation of an IVTests whether a condition change affects an outcomeSupports causal language when other controls are adequate
Random assignmentEquates groups at baselineReduces selection threats to internal validity
Random selectionImproves sample representativenessSupports population generalization (external validity)
Control/comparison groupProvides an outcome reference pointSeparates treatment from history, maturation, expectancy
Repeated measurementTracks change over time/phasesSupports trend, stability, single-case interpretation

Experimental designs include between-groups, within-subjects, factorial, and randomized controlled trials (RCTs). Factorial designs examine two or more independent variables and test main effects plus interactions (e.g., a 2x2 design crossing medication vs. placebo with therapy vs. no therapy). Within-subjects designs reduce individual-difference noise because each person serves as their own control, but they introduce order, fatigue, and practice effects; counterbalancing (varying the sequence across participants) manages those order problems.

Applied and Non-Experimental Designs

Quasi-experimental designs dominate applied settings because true random assignment is often impossible or unethical. A clinic may compare intact groups, use a waitlist control, or apply an interrupted time-series. These designs are valuable, but pre-existing selection differences must be weighed. A frequent EPPP key states that the intervention is associated with improvement while rejecting an option that claims the intervention caused improvement, because intact-group comparisons do not equate the groups.

Correlational designs measure naturally occurring relationships. A correlation supports prediction but cannot establish direction (the third-variable and directionality problems) or rule out confounds. Regression adds prediction and can statistically adjust for measured covariates, but statistical control is not experimental control; an omitted confound still biases the estimate.

Single-case (single-subject) designs are central in clinical and applied behavior analysis. The logic is repeated measurement plus phase comparison:

  • ABAB (reversal) design — baseline (A), intervention (B), withdraw (A), reintroduce (B). If behavior tracks the phases, the intervention is the likely cause. Reversal is inappropriate when the behavior should not or cannot return to baseline (e.g., learned skills, dangerous behavior).
  • Multiple-baseline design — staggers the intervention across behaviors, settings, or participants. It demonstrates effect without ever withdrawing a helpful intervention, so it is chosen when reversal is impractical or unethical.
  • Changing-criterion design — raises the performance target in steps; the behavior is shown to track each new criterion.

Key single-case requirements: a stable baseline before intervention, clear phase changes, and replication across behaviors, settings, or participants to rule out coincidence.

When answering design items, choose the strongest justified wording. Do not inflate a design (a quasi-experiment rarely "proves" causation), and do not dismiss a useful design because it is imperfect. The best option states what the design can show, what it cannot, and which validity issue is most relevant.

Longitudinal, Cross-Sectional, and Cohort Logic

Developmental and lifespan questions add a time dimension that the EPPP tests directly. A cross-sectional design measures different age groups at one time point; it is efficient but confounds age with cohort (generational) effects, because a 70-year-old and a 20-year-old differ not only in age but in the era they grew up in. A longitudinal design follows the same people over time, separating age change from cohort but introducing attrition and practice/testing effects, and tying results to one cohort's history.

A cross-sequential (cohort-sequential) design combines both, following several cohorts across overlapping intervals to disentangle age, cohort, and time-of-measurement effects. A classic exam trap is attributing a cross-sectional age difference (e.g., lower scores in older adults) to aging when it may reflect a cohort difference in education.

Sampling method also shapes generalization, and the EPPP expects the labels:

Sampling methodMechanismGeneralization quality
Simple randomEvery member has equal selection probabilityStrong, representative
Stratified randomRandom within defined strata (e.g., age bands)Strong; ensures subgroup representation
ClusterRandomly select intact groups, then sample withinPractical for large populations; some loss of precision
SystematicEvery kth case from a listAdequate unless the list is patterned
ConvenienceWhoever is available (volunteers)Weak; self-selection bias
SnowballParticipants recruit othersWeak; useful for hidden populations

Probability sampling (the first four) supports inferential generalization; non-probability sampling (convenience, snowball, purposive) limits it. Self-selection is the most common applied threat because volunteers differ systematically from non-volunteers in motivation, severity, and resources. When an EPPP stem describes "clients who chose to enroll" or "an online volunteer panel," external validity is in play even if the analysis is flawless.

The disciplined move is to keep the statistical finding but bound the population to which it can be applied, then ask what additional design feature (a comparison group, random assignment, a more representative sample) would strengthen the inference. That is the same logic the credited answer almost always reflects.

Test Your Knowledge

What is the key difference between random selection and random assignment?

A
B
C
D
Test Your Knowledge

A clinician must demonstrate that a reinforcement program works but cannot ethically withdraw it once a child's self-injury decreases. Which single-case design fits best?

A
B
C
D
Test Your Knowledge

A study finds a correlation between stress and sleep quality measured at a single time point. What conclusion is most defensible?

A
B
C
D