5.2 Research Designs and Causal Logic

Key Takeaways

  • Experimental designs use random assignment; quasi-experimental designs compare groups or time periods without it; nonexperimental designs describe without comparison.
  • One-group pretest-posttest designs show change but cannot rule out history, maturation, testing, or selection threats.
  • Comparison groups, repeated measures, matched sites, and clear eligibility criteria strengthen causal interpretation in field settings.
  • The defensible design is the one that fits feasibility, ethics, and the strength of claim the decision actually requires.
Last updated: June 2026

Choosing a Defensible Design

A research or evaluation design is the plan for comparing what happened with what would reasonably have happened without the program. Health education specialists usually work in schools, clinics, worksites, and community agencies where perfect control is impossible. The CHES exam expects you to pick a design that is ethical, feasible, and strong enough for the decision, not the most elaborate one available.

Experimental Designs

An experimental design uses random assignment to place eligible participants into intervention and control conditions. Random assignment distributes known and unknown differences across groups, reducing selection bias, and supports the strongest causal claims. It can be impractical or unethical when a service cannot be withheld or when partners require universal access. The classic forms are the randomized pretest-posttest control group design and the posttest-only control group design.

Quasi-Experimental Designs

A quasi-experimental design includes comparison but no random assignment. A school may compare one campus receiving a peer-education program with a similar campus that starts later. A county may compare pre-policy and post-policy clinic visits. These designs fit real programs but demand attention to baseline differences, timing, contamination between groups, and outside events.

Weaker and Descriptive Designs

  • One-group pretest-posttest measures the same people before and after. It shows change but cannot prove the program caused it; a news campaign, seasonal trend, or the pretest itself could explain improvement. Acceptable for small quality-improvement decisions, weak for major causal claims.
  • Cross-sectional measures variables at one point in time. It describes needs and associations but cannot establish temporal order; it cannot show that higher perceived risk caused more screening.
  • Interrupted time series uses repeated measures before and after an intervention to see whether a shift exceeds normal variation, e.g., 18 months of monthly referrals before and after a navigation protocol.
DesignRandom assignment?Comparison?Causal strength
Randomized controlledYesYesStrongest
Quasi-experimentalNoYesModerate to strong
Interrupted time seriesNoSelf (over time)Moderate
One-group pre/postNoNoWeak
Cross-sectionalNoNoDescriptive only

Threats to Internal Validity

These threats shape conclusions and appear in scenario items. The best answer often names the threat and selects a practical fix.

  • History - an outside event occurred during the evaluation period.
  • Maturation - participants changed naturally over time (aging, fatigue).
  • Testing - taking the pretest altered posttest responses.
  • Instrumentation - the measure or scorers changed between time points.
  • Selection - groups differed before the program began.
  • Attrition (mortality) - patterned dropout biased the remaining sample.
  • Regression to the mean - extreme baseline scores drift toward average.

Designs in Standard Notation

Evaluation texts use a shorthand worth recognizing on the exam: O marks an observation or measurement, X marks the intervention, and R marks random assignment. A one-group pretest-posttest design is O X O. A randomized pretest-posttest control group design is R O X O / R O O, with the control row receiving no X. A nonequivalent comparison group quasi-experiment is the same layout without the R. The Solomon four-group design adds groups with and without a pretest specifically to detect a testing effect.

You do not need to draw these, but recognizing that the presence of an R row signals an experiment, and that two rows signal a comparison group, speeds up scenario reading.

Internal vs External Validity

The threats listed above attack internal validity, the confidence that the program, not something else, caused the result. A separate concern is external validity, whether findings generalize to other people, settings, and times. A tightly controlled efficacy trial in a research clinic may have strong internal validity but weak external validity for a busy community center. Field programs often accept slightly weaker internal validity to gain realism and reach.

The exam may contrast a rigorous but artificial design with a feasible real-world one and ask which fits the stated purpose; the answer follows the decision, not the prestige of the method.

Worked Threat Example

Suppose a worksite stress program reports that average perceived-stress scores fell after eight weeks. Before crediting the program, list rival explanations. History: a company-wide layoff scare ended during the same window. Maturation: a seasonal slow period reduced workload. Testing: completing the same stress survey twice taught respondents the "right" low-stress answers. Regression to the mean: the program recruited only the most stressed employees, whose scores would naturally drift toward average.

A matched comparison worksite and a parallel control group would let the evaluator separate the program effect from these rivals.

Match Design to Claim

Do not memorize design names in isolation. Link each design to the claim it can support. A well-implemented randomized study supports stronger causal language. A quasi-experiment can persuade when comparison and baseline data are strong. A descriptive design is exactly right when the purpose is assessment, monitoring, or improvement rather than proof of causation. Adding a comparison group, repeated measures, matched sites, or a wait-list condition is usually the cheapest way to upgrade a field evaluation's credibility.

When an exam item asks for the single best way to strengthen a weak one-group study, the most common correct answer is to add a comparison or control group, because it directly addresses the missing counterfactual.

Test Your Knowledge

Which design feature most clearly distinguishes an experimental design from a quasi-experimental design?

A
B
C
D
Test Your Knowledge

A one-group pretest-posttest nutrition class shows improved label-reading scores. What is the main limitation?

A
B
C
D
Test Your Knowledge

A county tracks monthly referrals for 18 months before and 18 months after a clinic protocol change. Which design logic is being strengthened?

A
B
C
D
Test Your Knowledge

Half of a wait-list cohort dropped out of the posttest, and those who left had the lowest baseline scores. Which threat to internal validity is the strongest concern?

A
B
C
D