10.3 Implementation-to-Evaluation Chain
Key Takeaways
- Area III delivery questions hinge on fidelity, reach, dose delivered, and dose received before any outcome claim is made.
- Process evaluation asks whether the program ran as planned; outcome evaluation asks whether the target behavior or health indicator changed.
- Outcome claims are only valid when indicators and baseline data were defined during planning.
- Low attendance is an implementation and access problem first, not proof that participants are unmotivated.
10.3 Implementation-to-Evaluation Chain
When a program is running, CHES scenarios shift to Area III, Implementation, and Area IV, Evaluation and Research. The recurring question is whether the program was delivered as intended and whether the evaluation can actually answer the question being asked.
Implementation monitoring vocabulary
Process monitoring uses precise terms that the exam tests directly.
| Term | Definition |
|---|---|
| Fidelity | Degree the program is delivered as designed |
| Reach | Proportion of the intended population that participated |
| Dose delivered | Amount of program staff actually provided |
| Dose received | Extent participants engaged with the program |
| Adaptation | Planned, documented changes that preserve core components |
When staff deliver a curriculum differently at each site, the first review target is fidelity and facilitator training, not new outcome claims. Variation undermines the ability to attribute any later change to the program.
Process versus impact versus outcome evaluation
The exam expects three layers, not two:
- Process (formative/implementation) evaluation answers "Was the program delivered as planned, to whom, and how well?" It uses the fidelity, reach, and dose measures above.
- Impact evaluation answers "Did knowledge, attitudes, skills, or behavior change in the short term?"
- Outcome evaluation answers "Did the long-term health indicator—morbidity, mortality, quality of life—change?"
A scenario asking "Did blood pressure improve?" is an outcome question; "Did participants' nutrition knowledge rise immediately after the class?" is impact; "Was every session delivered?" is process. Participant satisfaction is a process measure and does not prove any behavioral result, even when ratings are glowing. Mislabeling these layers is one of the most common ways candidates lose evaluation items.
Evaluation designs and their evidence strength
| Design | Structure | Causal strength |
|---|---|---|
| Post-only | Measure after the program | Weakest |
| Pre-post (one group) | Measure before and after | Moderate; no control |
| Quasi-experimental | Comparison group, no randomization | Stronger |
| Experimental (RCT) | Randomized control group | Strongest |
The stronger the design, the more confidently change can be attributed to the program rather than to history, maturation, or selection. Real community programs often cannot randomize, so the exam rewards selecting the strongest feasible design and stating its limits.
Data collection methods and quality
Area IV scenarios also test how data are collected and judged. Quantitative methods (surveys, pre/post tests, biometric screenings) answer "how much" and "how many"; qualitative methods (interviews, focus groups, observation) answer "why" and "how." Strong evaluation mixes both. Two quality concepts recur: validity (the instrument measures what it claims to measure) and reliability (it measures consistently across raters and time).
A scenario describing a survey that participants interpret inconsistently is flagging a reliability problem; one measuring satisfaction but claiming it captures behavior change is flagging a validity problem. The exam expects you to name the right method for the question—qualitative interviews to explain low attendance, a comparison-group design to test effectiveness—rather than defaulting to whatever data are easiest to gather.
Why indicators and baselines matter
An outcome claim is only defensible if indicators and baseline data were defined during planning. If a team realizes no baseline was collected, the honest next step is to choose an evaluation design that can still answer a realistic question (for example, a post-only design comparing knowledge against a validated standard) and to state the limitation, not to assert impact anyway. This is where Area IV and Area VIII overlap, because overclaiming impact is both a methods error and an ethics violation.
Worked scenario
Attendance at a nutrition series is far below target. The weak answer blames participants. The strong answer treats low reach as an implementation and access issue: examine recruitment channels, scheduling, transportation, childcare, language, and cultural fit. Fixing access often recovers dose received; blaming participants ends inquiry and ignores equity. In a second item, two sites show different blood-pressure results, but a fidelity check reveals one site dropped the home-monitoring component. The correct interpretation ties the outcome difference to an implementation gap, not to a flawed curriculum.
Common traps
- Satisfaction-as-outcome trap: using "participants liked it" to claim behavior change.
- Claiming impact with no baseline or undefined indicators.
- Ignoring fidelity when delivery varies across staff or sites.
- Mislabeling a process question as outcome (or the reverse) on the exam.
- Attributing weak results to participants when access barriers reduced reach.
Use the cycle note here too: if the missing evidence is baseline data or fidelity records, the next best step lives in evaluation design and monitoring, not in new program activities.
Finally, remember that evaluation feeds back into the cycle. Findings should inform the next round of planning, justify continued funding, and be shared with stakeholders in plain language. A program that evaluates but never uses the results has wasted the effort. On the exam, the strongest evaluation answer not only measures the right thing with a feasible design but also closes the loop by reporting honestly and recommending concrete, feasible improvements that are clearly grounded in what the collected data actually showed.
Facilitators deliver the same curriculum very differently across three sites. What should be reviewed first?
Which question is a process (implementation) evaluation question rather than an outcome question?
A team discovers no baseline data were collected before the program began. What is the best next step?