Two observers record event data for a session. Observer A counts 24 responses and Observer B counts 21. Using total count IOA, what is the agreement?

21/24 x 100 = about 87.5%. Total count IOA divides the smaller count by the larger and multiplies by 100: 21/24 x 100 = 87.5%. Dividing larger by smaller (option B) yields an impossible value over 100%. Option C averages the counts, and option D computes a disagreement percentage, neither of which is the IOA formula.

A scale used to verify a behavior count consistently reads exactly the same value every trial but that value is always 3 responses higher than the true count. How should this be described?

Reliable but not accurate. Producing the same value every time is consistency, which is reliability. But because the value deviates from the true count by a fixed amount, it is inaccurate. Reliability does not guarantee accuracy or validity. It is not 'accurate but not reliable' because the readings are perfectly consistent, just consistently wrong.

For discrete-trial data where each trial is scored correct/incorrect, which IOA method is most appropriate?

Trial-by-trial IOA: agreements / (agreements + disagreements) x 100. Discrete-trial data are opportunity-based, so trial-by-trial IOA compares observers trial by trial: agreements divided by agreements plus disagreements, times 100. Duration IOA applies to timing data, total count IOA ignores trial-level matching and can inflate agreement, and mean count-per-interval applies to count data partitioned into time intervals.

Validity, Reliability, IOA, Procedural Integ | Free Guide 2026

Key Takeaways

Accuracy is closeness to the true value; reliability is consistency; validity is measuring the right behavior/dimension; they are independent.
IOA is a reliability index, not validity; the method must match the data (total count, mean count-per-interval, exact, trial-by-trial, duration).
Total count IOA = smaller/larger x 100; mean count-per-interval is more conservative because within-interval disagreements do not cancel out.
Most fields treat 80%+ IOA as minimally acceptable, with 90%+ preferred, though no threshold is universal.
Check procedural integrity and dosage before concluding a flat intervention has failed.

Accuracy, Reliability, and Validity

Three measurement-quality concepts anchor this section. Accuracy is the extent to which observed values match the true value of the behavior (often established by an independent calibrated standard or a thorough "true" count). Reliability is the consistency of measurement: the same behavior measured repeatedly yields the same value. Validity asks whether you measured the right thing, the behavior and dimension that answer the question.

These are independent. A bathroom scale that always reads 5 pounds high is reliable but inaccurate. Observers can be highly reliable yet invalid if they share a flawed definition, measuring accidental contact as "aggression" identically every time. The exam's recurring lesson: high agreement never proves validity. Always confirm the measure captures the intended response class and dimension before trusting consistent data.

Interobserver Agreement (IOA): Types and a Worked Calculation

Interobserver agreement (IOA) is the degree to which two independent observers report the same values for the same events. It is a reliability index, not a validity index. The method must match the measurement system:

IOA method	Used with	How it is computed
Total count IOA	Event/frequency data	(smaller count / larger count) x 100
Mean count-per-interval IOA	Count data split into intervals	average of per-interval (smaller/larger) percentages
Exact (interval-by-interval) agreement	Interval data	intervals of exact agreement / total intervals x 100
Trial-by-trial IOA	Discrete-trial / opportunity data	agreements / (agreements + disagreements) x 100
Total duration IOA	Duration data	(smaller duration / larger duration) x 100

Worked example (mean count-per-interval). Two observers count a behavior across five 1-minute intervals:

Interval	Obs A	Obs B	smaller/larger
1	4	4	4/4 = 100%
2	5	4	4/5 = 80%
3	3	3	3/3 = 100%
4	2	3	2/3 = 67%
5	6	5	5/6 = 83%

Mean count-per-interval IOA = (100 + 80 + 100 + 67 + 83) / 5 = 86%. Note that total count IOA on the same data (totals 20 vs. 19) would be 19/20 = 95%, higher because within-interval disagreements cancel out across the session. This is exactly why mean count-per-interval is the more conservative, more sensitive method: matching totals can hide trial-level disagreement. Most fields treat 80%+ as the minimum acceptable IOA, with 90%+ preferred, though no single threshold is universal.

Procedural Integrity, Treatment Fidelity, and Dosage

Data quality is more than the dependent-variable line on a graph. Procedural integrity (treatment fidelity / treatment integrity) measures whether the independent variable, the intervention, was implemented as written. Low integrity threatens internal validity: if outcomes are flat but the plan was not run correctly, you cannot conclude the plan failed. Dosage measures the amount of exposure: minutes, sessions, opportunities, or trials delivered.

Use this decision chain when interpreting disappointing data:

Observers disagree -> check the definition, training, scoring rules, and IOA method first.
Data do not match the clinical question -> check validity of the measure and dimension.
Intervention data are flat -> check procedural integrity and dosage before rejecting the plan.
Change appears only on some days -> check setting events, schedule, observer coverage, and representativeness.

The defensible rule: if outcomes are poor but integrity is low, improve implementation and collect more data; if integrity and dosage are adequate and outcomes are still poor, modification is warranted. Do not let a strong IOA value distract from validity or integrity; observers can agree perfectly on the wrong response class while the intervention was never delivered as designed.

Threats to Accuracy and Choosing the Right IOA Method

Several predictable threats degrade accuracy and reliability, and the exam expects you to name and prevent them. Observer drift is the gradual, unintentional change in how an observer applies a definition over time, two observers trained together slowly diverge. Observer reactivity occurs when being observed changes the observer's scoring (e.g., scoring more carefully when a supervisor is present).

Observer bias / expectancy is scoring that is nudged toward an anticipated result, which is why observers should be blind to phase or condition when feasible. Poorly designed datasheets and complex definitions also lower accuracy. The standard safeguards are clear definitions, thorough training, periodic recalibration, and routine IOA checks across the study, not just at the start.

Matching the IOA method to the data is itself a high-yield skill. The wrong method can inflate agreement and hide real disagreement:

Data type	Preferred IOA	Why
Free-operant count	Mean count-per-interval (more sensitive) over total count	Within-interval errors cancel in total count
Interval recording	Exact agreement; or occurrence/nonoccurrence IOA for rare/dense behavior	Exact is most stringent
Discrete-trial	Trial-by-trial	Preserves opportunity-level agreement
Duration / latency	Total duration or mean duration-per-occurrence	Matches the timed dimension

For interval data with very low-rate behavior, occurrence-only IOA (agreement only on intervals where at least one observer scored an occurrence) prevents agreement inflation from a long run of jointly empty intervals; for very high-rate behavior, nonoccurrence IOA does the parallel job.

Recognizing when total count IOA overstates agreement, and when occurrence/nonoccurrence agreement is the honest index, is exactly the nuance Domain C items probe. The unifying principle: report the agreement statistic that is most stringent and most informative for the measurement system you actually used, then interpret it as a check on reliability, never as evidence of validity.

Test Your Knowledge

An intervention has produced no improvement over two weeks. A fidelity check shows staff implemented the plan correctly on only 40% of opportunities. What is the most defensible next step?

Improve procedural integrity (retrain/support staff) and continue collecting data before judging the plan

Increase the reinforcement magnitude immediately

Abandon the intervention because the data are flat

Switch to an indirect measurement system

BCBA Study Guide

BCBA

Validity, Reliability, IOA, Procedural Integrity, and Dosage

Key Takeaways

Accuracy, Reliability, and Validity

Interobserver Agreement (IOA): Types and a Worked Calculation

Procedural Integrity, Treatment Fidelity, and Dosage

Threats to Accuracy and Choosing the Right IOA Method

BCBA Study Guide

1Orientation, Eligibility, and Exam Strategy

2Behaviorism, Philosophical Foundations, and ABA Dimensions

3Concepts and Principles

4Measurement, Data Display, and Interpretation

5Experimental Design and Visual Analysis

6Ethical and Professional Issues

7Behavior Assessment

8Behavior-Change Procedures

9Selecting and Implementing Interventions

10Personnel Supervision and Management

11Integrated Case Analysis and Domain Review

12Final Countdown, Results, and Next Steps

BCBA

Validity, Reliability, IOA, Procedural Integrity, and Dosage

Key Takeaways

Accuracy, Reliability, and Validity

Interobserver Agreement (IOA): Types and a Worked Calculation

Procedural Integrity, Treatment Fidelity, and Dosage

Threats to Accuracy and Choosing the Right IOA Method