9.2 Research Designs and Statistical Decisions

Key Takeaways

Experimental designs with random assignment and a control group are the only designs that support causal claims.
Internal validity is about cause-and-effect confidence; external validity is about generalizing to other people and settings.
Descriptive statistics summarize a sample; inferential statistics generalize from sample to population.
Match the statistic to the design and the scale of measurement: t-test for two means, ANOVA for three or more, correlation for association, chi-square for categories.

Last updated: June 2026

9.2 Research Designs and Statistical Decisions

Every Research item hides one workflow cue: what is the researcher trying to establish? The verb in the stem points to a design, the design points to a statistic, and the statistic must match the scale of measurement. Learn that chain and most items collapse into one decision.

Choosing a design

Design	What it can claim	Key features
True experimental	Cause and effect	Random assignment + control group + manipulated independent variable (IV)
Quasi-experimental	Probable effect, weaker	Comparison groups but no random assignment (intact classrooms)
Correlational	Association only	Measures two variables; no manipulation; cannot prove cause
Descriptive / survey	Describes a snapshot	Frequencies, percentages; no IV
Qualitative	Meaning and process	Interviews, observation; phenomenology, grounded theory, case study
Single-subject (ABAB)	Individual change	Repeated baseline and treatment phases for one client

The randomized controlled trial (RCT) is the gold standard for causality because random assignment equates groups on known and unknown confounds, isolating the IV as the explanation for any group difference.

Validity: the two questions

Internal validity asks: can the change be attributed to the IV rather than something else? Threats include history (an outside event), maturation (natural change over time), testing (practice on a repeated measure), instrumentation, selection bias, and regression to the mean (extreme scorers drifting toward average on retest).
External validity asks: do the results generalize to other people, settings, and times? A convenience sample of one university's clients limits external validity even if internal validity is strong.

These two often trade off: tightly controlled lab studies maximize internal validity but may sacrifice external validity, while field studies do the reverse.

From design to statistic

Descriptive statistics summarize the data you have; inferential statistics let you generalize to a population and test hypotheses.

Question being asked	Statistic	Scale required
Difference between two group means	Independent or paired t-test	Interval/ratio dependent variable (DV)
Difference among three or more means	ANOVA (analysis of variance)	Interval/ratio DV
Strength/direction of association	Pearson r (interval/ratio) or Spearman rho (ordinal)	Two continuous or ranked variables
Association between categories	Chi-square	Nominal data
Predicting a score from one or more variables	Regression	Interval/ratio outcome

Significance and error

The alpha level (conventionally .05) is the threshold for rejecting H0. A p-value below .05 means there is less than a 5% probability the result occurred by chance if H0 were true — it signals statistical significance, not practical importance. Two errors follow:

Type I error (false positive): rejecting a true null — concluding a treatment works when it does not. Its probability equals alpha.
Type II error (false negative): failing to reject a false null — missing a real effect. Statistical power is 1 minus the Type II error rate; larger samples increase power.

Effect size vs. significance

Effect size (Cohen's d for mean differences, r for correlations) reports the magnitude of an effect independent of sample size. Cohen's d benchmarks are small = 0.2, medium = 0.5, large = 0.8. A huge sample can make a trivial difference statistically significant, so the CPCE rewards candidates who pair p-values with effect sizes. The exam-ready model is: design tells you what you can claim, the statistic tells you how to test it, and effect size tells you whether anyone should care.

Sampling and generalizability

The design is only as strong as the sample that feeds it. Probability sampling gives every member of the population a known chance of selection and supports generalization: simple random (every person equally likely), stratified random (random selection within subgroups to guarantee representation), cluster (sampling whole intact groups), and systematic (every nth person). Non-probability sampling — convenience, purposive, and snowball — is faster but weakens external validity because the sample may not represent the population.

A frequent vignette describes a researcher recruiting only volunteers from one clinic and then over-claiming that results apply to all clients; the flaw is a convenience sample limiting generalizability, not the statistic. Larger, more representative samples also reduce sampling error, the random gap between a sample statistic and the true population value.

Reading confidence intervals

Inferential results are often expressed as a confidence interval (CI) rather than a single p-value. A 95% CI is the range that would capture the true population value in 95 of 100 repeated samples. If a 95% CI for a mean difference includes zero, the difference is not statistically significant at the .05 level — a quick check that mirrors the p-value rule. Narrow intervals indicate more precise estimates, which larger samples produce. Pairing the CI logic with the p-value and effect size gives a complete read of any inferential result: is the effect real, how big is it, and how precisely was it estimated?

A reasoning trap to rehearse

The single most tested misstep in this section is inferring cause from a non-experimental design. Correlational and quasi-experimental studies can describe and predict, but only an experiment with random assignment and a control group can support a causal claim. When a stem says a counselor "found that clients who journaled had lower anxiety" and asks what can be concluded, the answer is an association, not that journaling caused the decrease, because no manipulation or random assignment occurred.

Test Your Knowledge

A researcher compares mean depression scores across three counseling approaches (CBT, person-centered, and solution-focused). Which statistical test is most appropriate?

Independent-samples t-test

Pearson correlation

Chi-square test

Analysis of variance (ANOVA)

Test Your Knowledge

A counselor concludes a new intervention is effective when in reality it has no true effect. This is an example of:

A Type I error

A Type II error

Low statistical power

Regression to the mean

Up Next

9.3 Evidence-Based Practice and Research Ethics

Continue learning

CPCE Study Guide

CPCE Counselor Preparation Comprehensive Examination

9.2 Research Designs and Statistical Decisions

Key Takeaways

9.2 Research Designs and Statistical Decisions

Choosing a design

Validity: the two questions

From design to statistic

Significance and error

Effect size vs. significance

Sampling and generalizability

Reading confidence intervals

A reasoning trap to rehearse

CPCE Study Guide

1Chapter 1: CPCE Orientation and Exam Strategy

2Chapter 2: Professional Counseling Orientation and Ethical Practice

3Chapter 3: Social and Cultural Diversity

4Chapter 4: Human Growth and Development

5Chapter 5: Career Development

6Chapter 6: Counseling and Helping Relationships

7Chapter 7: Group Counseling and Group Work

8Chapter 8: Assessment and Testing

9Chapter 9: Research and Program Evaluation

10Chapter 10: Final Review and Test Day

CPCE Counselor Preparation Comprehensive Examination

9.2 Research Designs and Statistical Decisions

Key Takeaways

9.2 Research Designs and Statistical Decisions

Choosing a design

Validity: the two questions

From design to statistic

Significance and error

Effect size vs. significance

Sampling and generalizability

Reading confidence intervals

A reasoning trap to rehearse