Critiquing Graphs and Choosing Defensible Designs
Key Takeaways
- Visual analysis weighs level, trend, variability, immediacy of change, overlap, and consistency across similar conditions TOGETHER — never one feature alone.
- Strong demonstrations show stable baselines, immediate changes at phase lines, low overlap between conditions, and consistent, replicated effects.
- Weak graphs feature improving baselines, simultaneous changes across tiers, high overlap, missing integrity data, or confounded phase changes.
- Design choice depends on the research question, behavior type, reversibility, ethics, and feasibility — not on a single 'best' design.
- Reserve 'functional relation' language for clear, replicated control; say data 'suggest improvement' when replication is missing.
Reading a Graph Like an Examiner
Start with the question the graph was meant to answer, because the standard of evidence depends on it. A graph comparing two treatments should show clear separation between conditions. A graph testing one treatment package should show replicated change after each intervention onset. A graph shaping performance should show behavior tracking each criterion step.
Matching the standard of evidence to the question keeps you from grading every graph the same way. A package-test graph with no replication is weak even if the line looks great; a comparison graph with two clearly separated paths is strong even if neither path is perfectly flat.
Then apply the six visual-analysis features, always together:
- Level — the average or typical value of data within a phase.
- Trend — the overall direction (ascending, descending, flat) and slope.
- Variability — how much data bounce around the level within a phase.
- Immediacy of effect — how quickly behavior changes right at the phase-change line.
- Overlap — the proportion of data points sharing the same range across adjacent phases (less overlap = stronger effect).
- Consistency — whether similar conditions (e.g., the two baseline phases of an ABAB) show similar patterns.
No single feature is decisive. Immediate, low-overlap change across a stable baseline that replicates is convincing; delayed, high-overlap change over an unstable baseline is not.
Diagnosing Weak Graphs and the Better Move
| Problem in graph | Why it matters | Better move |
|---|---|---|
| Baseline already improving | Treatment effect is ambiguous (trend confound) | Extend baseline if ethical, or strengthen with replication |
| All tiers change at once | Staggered control is lost; looks like a coincident event | Delay/stagger intervention across tiers |
| High overlap across conditions | Effects are unclear; conditions look alike | Refine procedure or collect more data |
| No procedural integrity data | The IV may not have been implemented | Add procedural-integrity measurement |
| Confounded phase change | More than one variable changed at once | Change only one key variable per phase |
| Excessive variability | Steady state never reached; prediction is weak | Identify and control sources of variability |
Work through the table as a checklist when a stem shows a flawed graph. The 'best answer' is typically the option that restores prediction, verification, or replication — e.g., staggering tiers, extending baseline, or adding integrity data — rather than an option that merely declares success or that changes the clinical goal. Beware distractors that 'fix' a problem by introducing a new confound or by abandoning ethics.
Reading Each Feature Correctly
Candidates lose points by misreading individual features, so define each precisely. Level is read as the mean or median of points within a phase — a level change is a jump in that typical value at the phase line. Trend describes slope and direction; a change in trend (from flat to descending, say) can demonstrate an effect even without a level jump. Variability is bounce around the level; very high variability blocks interpretation no matter how good the means look.
Immediacy of effect compares the last few points of one phase with the first few points of the next; the closer to the phase line the change occurs, the stronger the inference. A delayed change leaves room for other causes. Overlap is the percentage of points in adjacent phases sharing the same value range — low or zero overlap signals a robust effect, high overlap signals a weak or absent one. Consistency asks whether like conditions (the two baselines, the two treatment phases of an ABAB) reproduce similar patterns; inconsistency hints at an uncontrolled variable.
Choosing a Defensible Design — and Not Overclaiming
Design selection is not memorization; it is matching the research question and constraints to a structure that supplies experimental control:
- Reversal (ABAB) — reversible behavior and ethical withdrawal.
- Multiple baseline — durable skill acquisition or unsafe/unethical withdrawal.
- Multielement (ATD) — fast comparison when conditions can be discriminated.
- Changing criterion — gradual, stepwise performance goals.
Before accepting any design as defensible, run this checklist:
- The dependent variable directly represents the referral problem or goal.
- The independent variable is operationally defined and feasible to implement.
- The design supplies prediction, verification, and replication.
- Plausible alternative explanations are anticipated and reduced (confounds, threats).
- The design is ethical and practical for this client and context.
Finally, calibrate your language to your evidence. A graph can be clinically useful even when experimental control is weak — improvement is improvement for the client. But for exam purposes, do not overclaim. When a design lacks replication, say the data suggest improvement; reserve 'functional relation' and 'the intervention caused the change' for clear, immediate, low-overlap effects that replicate.
The most defensible answer almost always pairs rigorous control with client safety, social significance, and feasibility — and states its conclusion no more strongly than the data allow. When in doubt between two options, pick the one that adds a replication or rules out a confound rather than the one that simply asserts a stronger conclusion. Examiners reward measured, evidence-bounded judgments over confident overreach.
In a multiple-baseline-across-settings graph, behavior in all three settings improves at the same time, even though intervention was introduced only in the first setting. What does this most likely indicate?
An ABAB graph shows a clear, immediate drop in problem behavior in each treatment phase, a return toward baseline in the second A phase, and low overlap between phases. Which conclusion is best supported?
A graph shows a treatment phase with substantial data-point overlap with baseline and a delayed, gradual change. There are no procedural-integrity data. What is the most appropriate examiner-style conclusion and next step?
Which design feature most directly distinguishes a defensible single-case demonstration from a graph that is merely clinically encouraging?