Reinforcement and Differential Reinforcement
Key Takeaways
- Reinforcement is defined functionally by an increase in future responding, never by the practitioner's intent or how pleasant a stimulus seems.
- Differential reinforcement is a family of procedures that pairs reinforcement for one response with extinction (withholding) of another; the variants differ in WHAT gets reinforced.
- DRA reinforces a specific alternative response, DRI reinforces a physically incompatible response, DRO reinforces the ABSENCE of behavior, DRL reinforces lower rates, and DRH reinforces higher rates.
- The strongest reductive plans are function-based: the alternative response should produce the SAME reinforcer the problem behavior produced, and do so more efficiently.
- DRO is the classic exam trap because it requires no replacement skill; choose DRA/FCT when the learner needs to learn what to do instead.
Reinforcement Is a Functional Relation
Reinforcement is the process in which a consequence following a response increases the future probability of that response under similar conditions. The defining feature is the effect on behavior, not the appearance of the consequence. A stimulus is a reinforcer only if the data show responding went up; otherwise it is just a stimulus you delivered.
Positive reinforcement adds a stimulus (a token, praise, access to a toy) and behavior increases. Negative reinforcement removes or postpones a stimulus (ending a demand, escaping noise) and behavior increases. Both strengthen behavior. The exam loves to test that escape from work is negative reinforcement when it increases the escape behavior, not punishment.
On the BCBA exam, never pick a reinforcer because it "sounds nice." Choose it because preference assessment, observation, or the functional analysis identified the maintaining consequence. A sticker is not reinforcing for a learner whose hitting is maintained by escape; an escape break is. Matching the consequence to function is the recurring theme of Domain G.
Reinforcement is also shaped by schedules. Continuous reinforcement (CRF) reinforces every response and is best for acquisition of a new skill. Intermittent schedules build resistance to extinction and are best for maintenance: fixed-ratio (FR) and variable-ratio (VR) reinforce by number of responses; fixed-interval (FI) and variable-interval (VI) reinforce by time. Variable schedules produce steadier responding and are harder to extinguish than fixed ones.
Watch for these recurring distractors:
- An option that calls a procedure reinforcement based on the practitioner's intent ("I rewarded her") rather than a measured behavior increase.
- An option that delivers a generally preferred item that does not match the identified function.
- Confusing negative reinforcement with punishment because something aversive is involved — negative reinforcement removes an aversive and increases behavior, while punishment decreases behavior.
Also distinguish automatic from socially mediated reinforcement. Automatic reinforcement is produced directly by the behavior (the sensory product of hand-flapping) with no other person required; socially mediated reinforcement is delivered by another person (a teacher hands over a break). Automatically reinforced behavior is harder to place on extinction because you often cannot control the reinforcer source, which changes procedure selection.
Differential Reinforcement Is a Selection Tool, Not One Procedure
Differential reinforcement (DR) always does two things at once: it reinforces one response class and places another response class on extinction (withholds reinforcement). The variants differ only in what contacts reinforcement. Knowing exactly what is reinforced in each is the key to the classic exam traps.
| Procedure | Reinforcement is delivered for... | Problem behavior is... | Best exam cue |
|---|---|---|---|
| DRA | A specific alternative response | On extinction | Teach a functionally equivalent, socially useful replacement |
| DRI | A physically incompatible response | On extinction | The two responses cannot occur at the same time |
| DRO | The absence of the target for a whole interval | On extinction | No replacement skill is required or specified |
| DRL | A response rate below a criterion (fewer occurrences) | Above-criterion rates withheld | Behavior is acceptable but happens too often |
| DRH | A response rate above a criterion (more occurrences) | Below-criterion rates withheld | Goal is to build fluency or frequency |
DRA is usually strongest when the alternative produces the same reinforcer as the problem behavior. If a learner hits to escape hard tasks, reinforcing a break request is more functionally matched than reinforcing "quiet sitting," which earns nothing the hitting earned.
DRI Is a Special Case of DRA
DRI (differential reinforcement of incompatible behavior) reinforces a response that is physically incompatible with the problem behavior — the learner cannot do both at once. Hands holding a fidget are incompatible with hand-mouthing. DRI is attractive when the incompatible response is natural and maintainable. But the exam will offer a technically incompatible response that is socially awkward or not function-matched (for example, requiring "hands in pockets" all day). A correct technical label is not enough; contextual fit matters.
Why DRO Is the Big Trap
DRO (differential reinforcement of other behavior) delivers reinforcement when the target behavior is absent for an entire interval. It does not teach what to do instead. It is sometimes called differential reinforcement of zero responding or omission training. Two cautions tested heavily:
- DRO can inadvertently reinforce whatever the learner is doing at the moment of delivery — including another problem behavior.
- DRO is weak when the scenario clearly needs communication, tolerance, or an academic response; those call for DRA/FCT.
If the item describes a learner who lacks a replacement skill, DRO alone is the trap answer; DRA or functional communication training (FCT) is usually better. DRO is more defensible when the replacement is already in repertoire or when DRO is paired with skill teaching.
DRL Is for Too-Much, Not Gone
DRL (differential reinforcement of low rates) reinforces responding that occurs at or below a set rate — used when a behavior is acceptable but occurs too often: raising a hand 30 times an hour, asking repeated questions, excessive water-fountain trips. A zero goal would be over-restrictive for a socially appropriate behavior. Two DRL forms: full-session DRL (reinforce if total for the session is at/below criterion) and spaced-responding DRL (reinforce a response only if a minimum inter-response time has passed). DRH is the mirror image — reinforcing rates above a criterion to build fluency.
Exam Decision Aid
- Identify the function and the current response class.
- Decide the goal: increase, replace, eliminate, or thin a rate.
- Confirm the learner can access reinforcement for the desired response (and can perform it).
- Match the reinforcer to the same relation the assessment identified.
- Check contextual fit — staff, caregivers, culture, safety, effort — before changing schedule or criterion.
A learner screams during math worksheets and is allowed to leave the table each time, which has increased screaming. The team wants a function-based replacement. Which procedure is MOST appropriate?
A student appropriately raises her hand to comment, but does so about 25 times per class, disrupting the lesson. The teacher wants to keep the behavior but reduce its frequency to about 5 per class. Which procedure fits best?
Which statement about DRO is the BEST reason to be cautious about selecting it as a stand-alone reductive procedure?
A behavior analyst reports, 'I reinforced on-task behavior by giving stickers, but on-task behavior did not change.' What is the most technically accurate critique?