7.6 Effect Size, Power, Clinical Significance, and Program Evaluation

Key Takeaways

  • Effect size describes the magnitude of a finding and can matter more for practice than whether a result merely crosses p < .05.
  • Clinical significance asks whether change is meaningful for functioning, risk, symptoms, or quality of life, not just statistically detectable.
  • Program evaluation applies research logic to real services using needs, process/fidelity, outcome, and cost data.
  • Evidence-based practice integrates the best research evidence, clinical expertise, and client characteristics, culture, and preferences.
Last updated: June 2026

Move From Results to Practice Decisions

Research findings become useful only when they inform decisions responsibly. A result can be statistically detectable yet too small, too narrow, or too uncertain to guide care alone, which is why the EPPP research domain folds in effect size, power, clinical significance, and program evaluation. These tools move the candidate from a table of results to a defensible professional conclusion.

Effect size quantifies magnitude, independent of sample size. Because a tiny p value can come from a large sample rather than a large effect, effect size is the better index of practical importance. Common metrics with conventional benchmarks:

Effect-size metricUsed forSmall / medium / large (Cohen)
Cohen's dDifference between two means0.2 / 0.5 / 0.8
Pearson rAssociation between two continuous variables0.1 / 0.3 / 0.5
Eta-squared / partial eta-squaredVariance explained in ANOVA.01 / .06 / .14
Odds ratio / risk ratioLikelihood comparisons in outcome research1.0 = no effect

The EPPP may ask which of two findings is more meaningful for clinical planning. A large, well-powered study with d = 0.15 is statistically significant but trivial; a smaller study with d = 0.80 may be more clinically compelling even if the p value is less impressive. Meta-analysis aggregates effect sizes across studies to produce a pooled estimate and examine moderators, which is why it sits near the top of the evidence hierarchy.

Clinical Significance, Power, and Evidence-Based Practice

Clinical significance is not the same as statistical evidence. A symptom score may shift enough to register in a study yet not enough for the client to return to work, sleep safely, reduce risk, or meet treatment goals. Jacobson and Truax's methods (reliable change index and movement from the dysfunctional to the functional distribution) operationalize whether change is meaningful, not merely detectable. Good judgment weighs both statistical and clinical meaning.

Power protects against false reassurance. Low power can stem from a small sample, noisy measurement, weak manipulation, high attrition, or inappropriate analysis. A non-significant result in an underpowered study does not prove an intervention is ineffective; it means the evidence is limited. The strongest EPPP answer avoids overstatement in either direction.

Evidence-based practice (EBP) in psychology, per the APA policy, is the integration of (1) the best available research evidence, (2) clinical expertise, and (3) patient characteristics, culture, and preferences. EBP is not mechanical manualized delivery; the best answer usually selects, adapts, monitors, and documents care for the specific person and context, considering developmental status, comorbidity, setting, risk, and resources.

Program evaluation brings research methods into agencies and systems through four staged questions:

  • Needs assessment — what does the population require?
  • Process / fidelity evaluation — is the program delivered as designed?
  • Outcome evaluation — did the targeted changes occur?
  • Cost / efficiency evaluation — do the benefits justify the resources?

Measurement quality governs all four. If a clinic claims success because attendance rose, the next questions are whether client outcomes improved, whether services were delivered with fidelity, and whether access improved for the intended groups. If only satisfied completers answer a survey, response/survivor bias inflates positive conclusions; if staff change documentation midyear, instrumentation distorts trends.

Treat every applied research item as a chain: the design creates the evidence, statistics summarize magnitude and uncertainty, clinical significance asks whether the change matters, and EBP asks how the finding should be used with a specific client or program.

The Evidence Hierarchy and a Final Integration

The EPPP expects familiarity with the rough hierarchy of research evidence, while recognizing that design quality, not just design type, determines trustworthiness:

LevelDesignStrength for causal claims
HighestSystematic review / meta-analysis of RCTsStrongest; pools effects, examines moderators
HighSingle well-conducted RCTStrong internal validity
ModerateQuasi-experimental / cohortUseful; confounding remains a risk
LowerCase-control / correlationalAssociation, prediction, hypothesis generation
LowestCase study / expert opinionDescriptive; weak for causal inference

A meta-analysis aggregates effect sizes across studies to yield a pooled estimate and to test moderators, but it inherits the flaws of its inputs ("garbage in, garbage out") and can suffer publication bias, the tendency for significant results to be published while null results stay in the file drawer, which inflates the apparent effect. Recognizing publication bias, and tools such as the funnel plot used to detect it, is a high-yield exam point.

Two applied cautions round out the domain. First, statistical regression and selection routinely masquerade as program success in uncontrolled evaluations; a program that enrolls only the most impaired clients will look effective even if it does nothing, because the most extreme scores drift toward average. A comparison group is the antidote.

Second, fidelity and dosage mediate every outcome claim: if a manualized treatment was delivered with poor adherence or in too few sessions, a null outcome may reflect implementation failure rather than an ineffective treatment, which is why process and fidelity data must be inspected before interpreting outcomes.

Pulling the chapter together, the disciplined EPPP test-taker reasons in a chain for every applied research item. The design determines what causal weight the evidence can bear. The measurement determines whether the variables are trustworthy. The statistics summarize uncertainty (significance, confidence intervals) and magnitude (effect size). Clinical significance asks whether the change matters in a client's life. And evidence-based practice integrates that evidence with clinical expertise and the client's culture, values, and preferences.

Choosing the answer that honors every link in that chain, and rejecting the one that overreaches at any link, is the single most reliable strategy for this 7% domain and for the evidence-evaluation reasoning woven through the rest of the EPPP.

Test Your Knowledge

Study A (N = 4,000) reports d = 0.12, p < .001; Study B (N = 60) reports d = 0.82, p = .04. Which is more clinically compelling, and why?

A
B
C
D
Test Your Knowledge

A clinic wants to confirm its new intake program is actually being delivered as designed before judging outcomes. Which evaluation focus is most relevant?

A
B
C
D
Test Your Knowledge

Under the APA definition, evidence-based practice in psychology is best described as:

A
B
C
D