5.1 Evaluation Purpose and Types

Key Takeaways

  • Area IV (Evaluation and Research) is one of the eight Areas of Responsibility on the CHES exam and its weight varies by administration, roughly 12-20% of the 150 scored items.
  • Formative, process, outcome, and impact evaluation answer different questions and must be matched to the program's stage and the decision at hand.
  • Strong evaluation plans link the question, indicator, data source, timing, and the specific decision before any data are collected.
  • Evaluation findings exist to drive improvement, accountability, and ethical stewardship of participant time and grant resources.
Last updated: June 2026

Why Area IV Matters on the CHES Exam

Area IV, Evaluation and Research, is one of the eight Areas of Responsibility defined by the National Commission for Health Education Credentialing (NCHEC) in the 2020 Health Education Specialist Practice Analysis (HESPA II). The Certified Health Education Specialist (CHES) exam is computer-based, delivered at Pearson VUE centers, and contains 165 multiple-choice items of which 150 are scored and 15 are unscored pilot items. Candidates have 3 hours; passing requires a scaled score of 600 on an 800-point scale.

Area IV's share of the exam is not fixed. NCHEC re-weights the blueprint each administration: it carried roughly 20% in April 2025 and about 12% in October 2025. Plan to know this area cold regardless of cycle, because evaluation logic also appears inside Planning (Area II) and Implementation (Area III) items.

The exam does not ask whether evaluation is "good." It asks what a specialist should measure, when, and how to use findings responsibly. That makes Area IV an applied decision process, not a vocabulary quiz.

The Four Evaluation Types

  • Formative evaluation improves a program before or during development (pretesting materials, pilot testing a curriculum, reviewing reading level).
  • Process evaluation checks whether implementation happened as planned: fidelity, dose delivered, dose received, reach, and recruitment.
  • Outcome evaluation looks for shorter-term change in knowledge, attitudes, skills, self-efficacy, intentions, behaviors, or environmental supports.
  • Impact evaluation examines longer-term, broader change such as morbidity, mortality, quality of life, policy adoption, or community conditions.
Evaluation typeCore questionTypical timing
FormativeWill this work and is it understandable?Before / during development
ProcessWas it delivered as designed?During delivery
OutcomeDid participants change?Short to intermediate term
ImpactDid the population or system change?Long term

A Common Exam Trap

A frequent distractor is choosing an outcome measure when the scenario describes a delivery problem. If attendance was low, facilitators skipped activities, or flyers never reached the priority population, the next step is process oriented, not outcome oriented. If the program ran as intended and the question is whether participants changed, outcome evaluation fits.

Writing the Evaluation Question

A weak question asks, "Did the program work?" A strong one names the population, expected change, topic, and measurement points: "Did participating ninth-grade students increase correct condom-use knowledge from baseline to immediate posttest?" That is outcome focused. "Were at least four of six planned parent sessions delivered with the approved curriculum and trained facilitators?" is process focused.

Anchor to the Logic Model

Use the program logic model to stay anchored. Inputs and activities generate process indicators. Outputs (number reached, sessions completed, materials distributed, referrals made) show dose and reach. Short-term outcomes include awareness, self-efficacy, and skills; intermediate outcomes involve behavior or policy adoption; long-term impacts usually require more time, larger samples, or surveillance data.

Evaluation is also an ethical responsibility. Programs consume public trust, participant time, staff labor, and grant funding. Collecting data that will never be used burdens communities; reporting only favorable findings misleads partners. A CHES-level answer balances feasibility with transparency, cultural humility, confidentiality, and usefulness.

Process vs Outcome vs Impact in Numbers

A worked example clarifies the boundaries. A diabetes self-management program plans six weekly classes for 60 enrolled adults. Process measures ask whether the program was delivered: 5 of 6 classes were held, each covered the planned topics, and the average participant attended 4.2 of 6 sessions (dose received about 70%). Outcome measures ask whether participants changed: mean correct carbohydrate-counting knowledge rose from 58% to 81%, and the share reporting daily glucose checks rose from 40% to 68% at three months.

Impact measures ask whether health or systems shifted: the clinic's proportion of enrolled patients with controlled HbA1c rose over the following year. Notice that without the process data, a flat outcome could mean either an ineffective curriculum or one that was barely delivered, which is why process evidence is needed to interpret outcomes.

Formative Evaluation Tools

Formative work is not vague brainstorming; it uses concrete techniques. Pretesting shows draft messages to members of the priority population to check comprehension, recall, and acceptability. Readability checks apply formulas such as Flesch-Kincaid or SMOG to confirm materials sit near a sixth- to eighth-grade reading level for general audiences. Pilot testing runs the full procedure with a small group to surface logistical problems before full rollout. Expert and stakeholder review confirms cultural fit and content accuracy.

On the exam, any item set before launch that asks "how do we make this better" points to formative evaluation, while "did it reach people as planned" points to process.

Scenario Sorting Step

First locate the program stage. Materials being drafted means formative. Staff delivering sessions means process. Participants expected to change knowledge, skill, or behavior means outcome. Population health, policy effects, or long-term conditions means impact. Then identify the decision maker and the decision: a funder needs accountability, a manager needs fidelity data, a coalition needs priority-setting evidence, and participants deserve accessible results. The best choice fits the decision, not the most rigorous-sounding design.

Quick Reference: Matching Type to Trigger Words

  • "Before launch," "pretest the brochure," "is it understandable" -> formative
  • "Were sessions delivered," "fidelity," "reach," "dose," "attendance" -> process
  • "Did knowledge/behavior/self-efficacy change," "pre to post" -> outcome
  • "Disease rates," "policy adopted," "quality of life," "long term" -> impact
Test Your Knowledge

A curriculum is being pilot tested before full rollout. Staff want to learn whether examples are understandable to the priority population. Which evaluation type fits best?

A
B
C
D
Test Your Knowledge

A grant manager asks whether all planned workshops were delivered using the approved lesson plan. What should be measured first?

A
B
C
D
Test Your Knowledge

Which evaluation question is written most clearly for an outcome evaluation?

A
B
C
D