Critiquing Graphs and Choosing Defensible De | Free Guide 2026

Key Takeaways

Visual analysis weighs level, trend, variability, immediacy of change, overlap, and consistency across similar conditions TOGETHER — never one feature alone.
Strong demonstrations show stable baselines, immediate changes at phase lines, low overlap between conditions, and consistent, replicated effects.
Weak graphs feature improving baselines, simultaneous changes across tiers, high overlap, missing integrity data, or confounded phase changes.
Design choice depends on the research question, behavior type, reversibility, ethics, and feasibility — not on a single 'best' design.
Reserve 'functional relation' language for clear, replicated control; say data 'suggest improvement' when replication is missing.

Reading a Graph Like an Examiner

Start with the question the graph was meant to answer, because the standard of evidence depends on it. A graph comparing two treatments should show clear separation between conditions. A graph testing one treatment package should show replicated change after each intervention onset. A graph shaping performance should show behavior tracking each criterion step.

Matching the standard of evidence to the question keeps you from grading every graph the same way. A package-test graph with no replication is weak even if the line looks great; a comparison graph with two clearly separated paths is strong even if neither path is perfectly flat.

Then apply the six visual-analysis features, always together:

Level — the average or typical value of data within a phase.
Trend — the overall direction (ascending, descending, flat) and slope.
Variability — how much data bounce around the level within a phase.
Immediacy of effect — how quickly behavior changes right at the phase-change line.
Overlap — the proportion of data points sharing the same range across adjacent phases (less overlap = stronger effect).
Consistency — whether similar conditions (e.g., the two baseline phases of an ABAB) show similar patterns.

No single feature is decisive. Immediate, low-overlap change across a stable baseline that replicates is convincing; delayed, high-overlap change over an unstable baseline is not.

Diagnosing Weak Graphs and the Better Move

Problem in graph	Why it matters	Better move
Baseline already improving	Treatment effect is ambiguous (trend confound)	Extend baseline if ethical, or strengthen with replication
All tiers change at once	Staggered control is lost; looks like a coincident event	Delay/stagger intervention across tiers
High overlap across conditions	Effects are unclear; conditions look alike	Refine procedure or collect more data
No procedural integrity data	The IV may not have been implemented	Add procedural-integrity measurement
Confounded phase change	More than one variable changed at once	Change only one key variable per phase
Excessive variability	Steady state never reached; prediction is weak	Identify and control sources of variability

Work through the table as a checklist when a stem shows a flawed graph. The 'best answer' is typically the option that restores prediction, verification, or replication — e.g., staggering tiers, extending baseline, or adding integrity data — rather than an option that merely declares success or that changes the clinical goal. Beware distractors that 'fix' a problem by introducing a new confound or by abandoning ethics.

Reading Each Feature Correctly

Candidates lose points by misreading individual features, so define each precisely. Level is read as the mean or median of points within a phase — a level change is a jump in that typical value at the phase line. Trend describes slope and direction; a change in trend (from flat to descending, say) can demonstrate an effect even without a level jump. Variability is bounce around the level; very high variability blocks interpretation no matter how good the means look.

Immediacy of effect compares the last few points of one phase with the first few points of the next; the closer to the phase line the change occurs, the stronger the inference. A delayed change leaves room for other causes. Overlap is the percentage of points in adjacent phases sharing the same value range — low or zero overlap signals a robust effect, high overlap signals a weak or absent one. Consistency asks whether like conditions (the two baselines, the two treatment phases of an ABAB) reproduce similar patterns; inconsistency hints at an uncontrolled variable.

Choosing a Defensible Design — and Not Overclaiming

Design selection is not memorization; it is matching the research question and constraints to a structure that supplies experimental control:

Reversal (ABAB) — reversible behavior and ethical withdrawal.
Multiple baseline — durable skill acquisition or unsafe/unethical withdrawal.
Multielement (ATD) — fast comparison when conditions can be discriminated.
Changing criterion — gradual, stepwise performance goals.

Before accepting any design as defensible, run this checklist:

The dependent variable directly represents the referral problem or goal.
The independent variable is operationally defined and feasible to implement.
The design supplies prediction, verification, and replication.
Plausible alternative explanations are anticipated and reduced (confounds, threats).
The design is ethical and practical for this client and context.

Finally, calibrate your language to your evidence. A graph can be clinically useful even when experimental control is weak — improvement is improvement for the client. But for exam purposes, do not overclaim. When a design lacks replication, say the data suggest improvement; reserve 'functional relation' and 'the intervention caused the change' for clear, immediate, low-overlap effects that replicate.

The most defensible answer almost always pairs rigorous control with client safety, social significance, and feasibility — and states its conclusion no more strongly than the data allow. When in doubt between two options, pick the one that adds a replication or rules out a confound rather than the one that simply asserts a stronger conclusion. Examiners reward measured, evidence-bounded judgments over confident overreach.

Test Your Knowledge

In a multiple-baseline-across-settings graph, behavior in all three settings improves at the same time, even though intervention was introduced only in the first setting. What does this most likely indicate?

A strong functional relation has been demonstrated

A loss of experimental control — possibly an extraneous variable affecting all settings or generalization across non-independent tiers

The baseline was too long

Immediacy of effect was too rapid to be valid

Test Your Knowledge

An ABAB graph shows a clear, immediate drop in problem behavior in each treatment phase, a return toward baseline in the second A phase, and low overlap between phases. Which conclusion is best supported?

The data merely suggest improvement; no causal claim is warranted

A functional relation between the intervention and the behavior change is demonstrated through replicated, immediate, low-overlap effects

External validity has been firmly established for all clients

The design is invalid because behavior returned toward baseline

Test Your Knowledge

A graph shows a treatment phase with substantial data-point overlap with baseline and a delayed, gradual change. There are no procedural-integrity data. What is the most appropriate examiner-style conclusion and next step?

Declare a functional relation and disseminate the result

Conclude the treatment definitely does not work

Switch the dependent variable to one that overlaps less

The demonstration is weak (high overlap, delayed change, no integrity data); collect procedural-integrity data and strengthen the design before drawing causal conclusions

Test Your Knowledge

Which design feature most directly distinguishes a defensible single-case demonstration from a graph that is merely clinically encouraging?

A visually attractive, smooth data path

Replication of the effect at planned times (prediction, verification, replication), making coincidence implausible

A long treatment phase with many data points

A high average level of the target behavior in treatment

BCBA Study Guide

BCBA

Critiquing Graphs and Choosing Defensible Designs

Key Takeaways

Reading a Graph Like an Examiner

Diagnosing Weak Graphs and the Better Move

Reading Each Feature Correctly

Choosing a Defensible Design — and Not Overclaiming

BCBA Study Guide

1Orientation, Eligibility, and Exam Strategy

2Behaviorism, Philosophical Foundations, and ABA Dimensions

3Concepts and Principles

4Measurement, Data Display, and Interpretation

5Experimental Design and Visual Analysis

6Ethical and Professional Issues

7Behavior Assessment

8Behavior-Change Procedures

9Selecting and Implementing Interventions

10Personnel Supervision and Management

11Integrated Case Analysis and Domain Review

12Final Countdown, Results, and Next Steps

BCBA

Critiquing Graphs and Choosing Defensible Designs

Key Takeaways

Reading a Graph Like an Examiner

Diagnosing Weak Graphs and the Better Move

Reading Each Feature Correctly

Choosing a Defensible Design — and Not Overclaiming