Measurement-to-Design-to-Intervention Chain

Key Takeaways

  • Measurement quality sets a ceiling on how much confidence you can place in visual analysis, design conclusions, and treatment decisions.
  • Single-case design logic rests on repeated measurement plus prediction, verification, and replication of effect.
  • Match the design to the case constraints: reversal when withdrawal is safe and behavior reversible; multiple baseline when withdrawal is unsafe; multielement to compare conditions; changing criterion for gradual stepwise change.
  • Before changing a poorly performing intervention, rule out definition drift, bad data, and low procedural integrity - the treatment may be fine and the implementation broken.
  • Mixed items frequently test whether you keep the chain in order: definition then measurement then design then procedure then data-based decision.
Last updated: June 2026

The Chain Is Only as Strong as Its Weakest Link

Case analysis is a chain of dependencies. A vague operational definition corrupts measurement. Poor measurement corrupts graph interpretation. A weak design corrupts causal claims. Indefensible causal claims make every intervention decision shakier.

Domain C asks whether the data faithfully represent the behavior. Domain D asks whether the arrangement can demonstrate a functional relation (that the intervention, not history or maturation, produced change). Domains G and H ask whether the procedure follows from assessment, evidence, client preference, contextual fit, and ongoing data.

A frequent trap is the measurement-dimension mismatch. If the clinical question is "how long does the tantrum last," duration is the right dimension, not frequency. If it is "how quickly does the learner respond to an instruction," the dimension is latency. Using count when the question is about duration produces data that look fine but cannot answer the case.

Another trap: discontinuous measurement bias. Partial-interval recording tends to overestimate duration-type behaviors; whole-interval recording tends to underestimate them; momentary time sampling can miss brief events. Choose the method whose bias does not undermine the decision.

Measurement reliability sets the second ceiling. Interobserver agreement (IOA) tells you whether two observers, watching the same behavior, score it the same way; low IOA usually traces back to a fuzzy operational definition. A graph built on low-IOA data cannot support a confident treatment decision no matter how clean the lines look, because you do not actually know the data represent the behavior. On an integrated item, an unverified or low IOA is a chain link the BEST answer will address before it touches design or procedure.

Run the Chain Check

For any integrated item, audit each link. The weakest link tells you what the BEST next step is.

LinkThe exam check
DefinitionCan two observers identify the response and nonexamples consistently (clear, complete, objective)?
DimensionDoes count, rate, duration, latency, IRT, or trials-to-criterion match the clinical question?
MethodIs measurement direct, valid, reliable (IOA), and feasible in the setting?
BaselineIs the series stable or trending in a way that supports prediction?
DesignIs reversal, multiple baseline, multielement, or changing criterion defensible here?
ProcedureDoes it match function, skill deficit, risk level, and setting capacity?
IntegrityAre staff implementing the plan as written (treatment integrity / fidelity)?
DecisionDo the data support continuing, modifying, or terminating the plan?

Matching Design to the Case

The single-case design is not chosen for familiarity; it is chosen because it answers the case question within ethical and practical limits.

  • Reversal (ABAB): Use when the behavior is reversible and brief withdrawal of treatment is safe and ethical. Demonstrates control by showing behavior tracks the condition. Avoid for dangerous behavior (you should not reinstate self-injury to prove a point) or for skills that, once learned, will not reverse.
  • Multiple baseline: Use when withdrawal is unsafe or impractical, or behavior is not reversible (e.g., a learned academic skill). Stagger intervention across behaviors, settings, or participants; control is shown when each tier changes only after intervention reaches it.
  • Multielement (alternating treatments): Use to rapidly compare two or more conditions or procedures. Watch for multiple-treatment interference.
  • Changing criterion: Use when change is expected to occur in gradual, stepwise increments (e.g., increasing exercise minutes); control is shown when behavior tracks each new criterion.

Visual Analysis Before Any Conclusion

Before the design or the modification rule even applies, the data have to be read correctly. Visual analysis of single-case graphs rests on a small set of properties you should evaluate in every phase and across every phase change:

  • Level - the average value of the data within a phase.
  • Trend - the direction and steepness (improving, worsening, flat).
  • Variability - how much the points scatter around the level/trend.
  • Immediacy of effect - how quickly behavior changes at the phase line.
  • Overlap - how much data from adjacent phases share the same range (less overlap = stronger effect).
  • Consistency - whether similar phases show similar patterns.

A tempting distractor reads a single improving phase as proof of treatment effect. It is not: without replication of effect across phases (the verification and replication elements of design logic), an improving trend could reflect maturation, history, or an uncontrolled variable.

The Modification Rule

When a graph shows weak or no effect, do not immediately swap the procedure. First rule out the cheaper explanations, in roughly this order:

  1. Definition drift - are observers now scoring the behavior differently?
  2. Data quality - low IOA, missing sessions, biased sampling?
  3. Procedural integrity - is the plan being run as written? (Often the real culprit.)
  4. Dosage / schedule - is reinforcement thin, delayed, or for the wrong response?
  5. Competing contingencies and MOs - is a richer reinforcer available elsewhere; is the EO weak?
  6. Functional hypothesis - does the original function still fit, or was the FBA wrong?

Only after these checks should you conclude the procedure needs changing. The exam frequently offers "change the intervention" as a tempting answer when the data actually point to an integrity failure - and the correct response is to retrain and measure integrity, not redesign the program.

The same logic runs in reverse for a graph that looks great: confirm the gain is real (adequate IOA, no definition loosening) before celebrating, because inflated data can manufacture a 'functional relation' that does not exist.

Test Your Knowledge

A learner's aggression intervention shows a flat, high baseline-like pattern for three weeks despite a well-matched DRA plan. IOA is 95%, the definition is stable, and the graph is clean. A treatment-integrity check has not yet been run. What is the BEST next step?

A
B
C
D
Test Your Knowledge

A BCBA must reduce a client's self-injurious behavior (SIB). The team wants to demonstrate experimental control over the new intervention's effect. Which design is MOST appropriate?

A
B
C
D
Test Your Knowledge

A team wants to compare two prompting strategies for teaching a tact, quickly, with the same learner. Which design fits, and what is the main threat to watch?

A
B
C
D
Test Your Knowledge

A vignette asks 'how long do the learner's crying episodes last.' The technician has been collecting frequency counts. The episodes vary widely in length. What is the measurement problem and fix?

A
B
C
D