9.2 Research Designs and Statistical Decisions

Key Takeaways

  • Experimental designs with random assignment and a control group are the only designs that support causal claims.
  • Internal validity is about cause-and-effect confidence; external validity is about generalizing to other people and settings.
  • Descriptive statistics summarize a sample; inferential statistics generalize from sample to population.
  • Match the statistic to the design and the scale of measurement: t-test for two means, ANOVA for three or more, correlation for association, chi-square for categories.
Last updated: June 2026

9.2 Research Designs and Statistical Decisions

Every Research item hides one workflow cue: what is the researcher trying to establish? The verb in the stem points to a design, the design points to a statistic, and the statistic must match the scale of measurement. Learn that chain and most items collapse into one decision.

Choosing a design

DesignWhat it can claimKey features
True experimentalCause and effectRandom assignment + control group + manipulated independent variable (IV)
Quasi-experimentalProbable effect, weakerComparison groups but no random assignment (intact classrooms)
CorrelationalAssociation onlyMeasures two variables; no manipulation; cannot prove cause
Descriptive / surveyDescribes a snapshotFrequencies, percentages; no IV
QualitativeMeaning and processInterviews, observation; phenomenology, grounded theory, case study
Single-subject (ABAB)Individual changeRepeated baseline and treatment phases for one client

The randomized controlled trial (RCT) is the gold standard for causality because random assignment equates groups on known and unknown confounds, isolating the IV as the explanation for any group difference.

Validity: the two questions

  • Internal validity asks: can the change be attributed to the IV rather than something else? Threats include history (an outside event), maturation (natural change over time), testing (practice on a repeated measure), instrumentation, selection bias, and regression to the mean (extreme scorers drifting toward average on retest).
  • External validity asks: do the results generalize to other people, settings, and times? A convenience sample of one university's clients limits external validity even if internal validity is strong.

These two often trade off: tightly controlled lab studies maximize internal validity but may sacrifice external validity, while field studies do the reverse.

From design to statistic

Descriptive statistics summarize the data you have; inferential statistics let you generalize to a population and test hypotheses.

Question being askedStatisticScale required
Difference between two group meansIndependent or paired t-testInterval/ratio dependent variable (DV)
Difference among three or more meansANOVA (analysis of variance)Interval/ratio DV
Strength/direction of associationPearson r (interval/ratio) or Spearman rho (ordinal)Two continuous or ranked variables
Association between categoriesChi-squareNominal data
Predicting a score from one or more variablesRegressionInterval/ratio outcome

Significance and error

The alpha level (conventionally .05) is the threshold for rejecting H0. A p-value below .05 means there is less than a 5% probability the result occurred by chance if H0 were true — it signals statistical significance, not practical importance. Two errors follow:

  • Type I error (false positive): rejecting a true null — concluding a treatment works when it does not. Its probability equals alpha.
  • Type II error (false negative): failing to reject a false null — missing a real effect. Statistical power is 1 minus the Type II error rate; larger samples increase power.

Effect size vs. significance

Effect size (Cohen's d for mean differences, r for correlations) reports the magnitude of an effect independent of sample size. Cohen's d benchmarks are small = 0.2, medium = 0.5, large = 0.8. A huge sample can make a trivial difference statistically significant, so the CPCE rewards candidates who pair p-values with effect sizes. The exam-ready model is: design tells you what you can claim, the statistic tells you how to test it, and effect size tells you whether anyone should care.

Sampling and generalizability

The design is only as strong as the sample that feeds it. Probability sampling gives every member of the population a known chance of selection and supports generalization: simple random (every person equally likely), stratified random (random selection within subgroups to guarantee representation), cluster (sampling whole intact groups), and systematic (every nth person). Non-probability samplingconvenience, purposive, and snowball — is faster but weakens external validity because the sample may not represent the population.

A frequent vignette describes a researcher recruiting only volunteers from one clinic and then over-claiming that results apply to all clients; the flaw is a convenience sample limiting generalizability, not the statistic. Larger, more representative samples also reduce sampling error, the random gap between a sample statistic and the true population value.

Reading confidence intervals

Inferential results are often expressed as a confidence interval (CI) rather than a single p-value. A 95% CI is the range that would capture the true population value in 95 of 100 repeated samples. If a 95% CI for a mean difference includes zero, the difference is not statistically significant at the .05 level — a quick check that mirrors the p-value rule. Narrow intervals indicate more precise estimates, which larger samples produce. Pairing the CI logic with the p-value and effect size gives a complete read of any inferential result: is the effect real, how big is it, and how precisely was it estimated?

A reasoning trap to rehearse

The single most tested misstep in this section is inferring cause from a non-experimental design. Correlational and quasi-experimental studies can describe and predict, but only an experiment with random assignment and a control group can support a causal claim. When a stem says a counselor "found that clients who journaled had lower anxiety" and asks what can be concluded, the answer is an association, not that journaling caused the decrease, because no manipulation or random assignment occurred.

Test Your Knowledge

A researcher compares mean depression scores across three counseling approaches (CBT, person-centered, and solution-focused). Which statistical test is most appropriate?

A
B
C
D
Test Your Knowledge

A counselor concludes a new intervention is effective when in reality it has no true effect. This is an example of:

A
B
C
D