7.2 Designs, Sampling, and Variable Control
Key Takeaways
- Experimental designs support causal inference when manipulation, control, and random assignment are adequate.
- Quasi-experimental and correlational designs can be useful but require caution about causal language.
- Random selection affects external validity; random assignment affects internal validity, and a study can have one without the other.
- Single-case designs (ABAB reversal, multiple-baseline) show intervention effects through stable baselines, phase changes, and replication.
Match the Claim to the Design
A research design is a plan for answering a question, and on the EPPP design items test whether the conclusion fits how the data were gathered. If a researcher manipulates an independent variable, controls plausible alternatives, and uses appropriate assignment, a causal inference is stronger. If a researcher only measures variables as they naturally occur, the study describes association or prediction, not causation.
The single most tested distinction is random selection vs. random assignment. Random selection concerns who is drawn from the population and therefore drives generalization (external validity). Random assignment concerns how participants are placed into conditions and therefore drives group comparability (internal validity). A study can have one without the other: a campus experiment may randomly assign volunteers to conditions (good internal validity) yet generalize poorly (weak external validity) because the sample is unrepresentative.
| Design feature | Main purpose | EPPP inference cue |
|---|---|---|
| Manipulation of an IV | Tests whether a condition change affects an outcome | Supports causal language when other controls are adequate |
| Random assignment | Equates groups at baseline | Reduces selection threats to internal validity |
| Random selection | Improves sample representativeness | Supports population generalization (external validity) |
| Control/comparison group | Provides an outcome reference point | Separates treatment from history, maturation, expectancy |
| Repeated measurement | Tracks change over time/phases | Supports trend, stability, single-case interpretation |
Experimental designs include between-groups, within-subjects, factorial, and randomized controlled trials (RCTs). Factorial designs examine two or more independent variables and test main effects plus interactions (e.g., a 2x2 design crossing medication vs. placebo with therapy vs. no therapy). Within-subjects designs reduce individual-difference noise because each person serves as their own control, but they introduce order, fatigue, and practice effects; counterbalancing (varying the sequence across participants) manages those order problems.
Applied and Non-Experimental Designs
Quasi-experimental designs dominate applied settings because true random assignment is often impossible or unethical. A clinic may compare intact groups, use a waitlist control, or apply an interrupted time-series. These designs are valuable, but pre-existing selection differences must be weighed. A frequent EPPP key states that the intervention is associated with improvement while rejecting an option that claims the intervention caused improvement, because intact-group comparisons do not equate the groups.
Correlational designs measure naturally occurring relationships. A correlation supports prediction but cannot establish direction (the third-variable and directionality problems) or rule out confounds. Regression adds prediction and can statistically adjust for measured covariates, but statistical control is not experimental control; an omitted confound still biases the estimate.
Single-case (single-subject) designs are central in clinical and applied behavior analysis. The logic is repeated measurement plus phase comparison:
- ABAB (reversal) design — baseline (A), intervention (B), withdraw (A), reintroduce (B). If behavior tracks the phases, the intervention is the likely cause. Reversal is inappropriate when the behavior should not or cannot return to baseline (e.g., learned skills, dangerous behavior).
- Multiple-baseline design — staggers the intervention across behaviors, settings, or participants. It demonstrates effect without ever withdrawing a helpful intervention, so it is chosen when reversal is impractical or unethical.
- Changing-criterion design — raises the performance target in steps; the behavior is shown to track each new criterion.
Key single-case requirements: a stable baseline before intervention, clear phase changes, and replication across behaviors, settings, or participants to rule out coincidence.
When answering design items, choose the strongest justified wording. Do not inflate a design (a quasi-experiment rarely "proves" causation), and do not dismiss a useful design because it is imperfect. The best option states what the design can show, what it cannot, and which validity issue is most relevant.
Longitudinal, Cross-Sectional, and Cohort Logic
Developmental and lifespan questions add a time dimension that the EPPP tests directly. A cross-sectional design measures different age groups at one time point; it is efficient but confounds age with cohort (generational) effects, because a 70-year-old and a 20-year-old differ not only in age but in the era they grew up in. A longitudinal design follows the same people over time, separating age change from cohort but introducing attrition and practice/testing effects, and tying results to one cohort's history.
A cross-sequential (cohort-sequential) design combines both, following several cohorts across overlapping intervals to disentangle age, cohort, and time-of-measurement effects. A classic exam trap is attributing a cross-sectional age difference (e.g., lower scores in older adults) to aging when it may reflect a cohort difference in education.
Sampling method also shapes generalization, and the EPPP expects the labels:
| Sampling method | Mechanism | Generalization quality |
|---|---|---|
| Simple random | Every member has equal selection probability | Strong, representative |
| Stratified random | Random within defined strata (e.g., age bands) | Strong; ensures subgroup representation |
| Cluster | Randomly select intact groups, then sample within | Practical for large populations; some loss of precision |
| Systematic | Every kth case from a list | Adequate unless the list is patterned |
| Convenience | Whoever is available (volunteers) | Weak; self-selection bias |
| Snowball | Participants recruit others | Weak; useful for hidden populations |
Probability sampling (the first four) supports inferential generalization; non-probability sampling (convenience, snowball, purposive) limits it. Self-selection is the most common applied threat because volunteers differ systematically from non-volunteers in motivation, severity, and resources. When an EPPP stem describes "clients who chose to enroll" or "an online volunteer panel," external validity is in play even if the analysis is flawless.
The disciplined move is to keep the statistical finding but bound the population to which it can be applied, then ask what additional design feature (a comparison group, random assignment, a more representative sample) would strengthen the inference. That is the same logic the credited answer almost always reflects.
What is the key difference between random selection and random assignment?
A clinician must demonstrate that a reinforcement program works but cannot ethically withdraw it once a child's self-injury decreases. Which single-case design fits best?
A study finds a correlation between stress and sleep quality measured at a single time point. What conclusion is most defensible?