5.3 Indicators, Measures, Reliability, and Validity

Key Takeaways

  • Indicators translate objectives into observable evidence such as attendance, fidelity, knowledge scores, behavior, policy, or health status.
  • Reliability is consistency of measurement; validity is whether the measure captures the intended construct.
  • Validated instruments are preferred when they fit the population's language, literacy, culture, and the evaluation question.
  • Operational definitions specify exactly what will be counted, observed, or scored, preventing inconsistent data across staff and sites.
Last updated: June 2026

From Objectives to Indicators

An indicator is the observable sign that an objective was met or a process occurred. If an objective says participants will demonstrate correct inhaler technique, the indicator cannot be general satisfaction; it must be a checklist score, demonstration rating, or similar evidence. The exam frequently tests whether you can choose the indicator closest to the verb in the objective.

Indicators may be quantitative (counts, percentages, means, rates, scores) or qualitative (themes from interviews, observations, open-ended comments). Neither is automatically better; the best indicator answers the evaluation question with enough accuracy for the decision.

Objective levelExample indicator
Process4 of 6 sessions delivered with trained facilitators
Output250 brochures distributed; 80 referrals made
Short-term outcomeMean self-efficacy score rises from 3.1 to 4.0
Behavioral outcomePercent screened in 6 months
ImpactClinic-level HbA1c control rate

Reliability: Consistency

Reliability is the consistency of a measure when the underlying trait has not changed.

  • Interrater reliability - two observers scoring the same demonstration agree.
  • Test-retest reliability - a stable concept yields similar scores at two time points.
  • Internal consistency - items meant to measure one construct correlate (often reported as Cronbach's alpha).

Validity: Meaning

Validity is whether the measure captures what it intends to.

  • Content validity - items cover the full concept.
  • Criterion validity - scores relate to an accepted standard (concurrent or predictive).
  • Construct validity - the measure behaves as theory predicts relative to other variables.
  • Face validity - the measure looks appropriate to respondents, but appearance alone is never sufficient.

A measure can be reliable yet invalid: a miscalibrated scale that reads 5 pounds heavy is perfectly consistent but wrong. It cannot be valid while unreliable, because random inconsistency obscures the true value.

Operational Definitions

An operational definition makes a concept concrete. Rather than "participation," define whether it means attending one session, attending four of six, completing homework, or actively practicing a skill. Rather than "healthy food access," define distance, price, store type, hours, or availability of specific items. Operational definitions let different staff, sites, and time points produce the same meaning.

Fit With the Priority Population

Measurement must fit the people. A validated English instrument may lose validity after an informal translation. A 40-item survey may be unrealistic in a busy clinic where a validated 10-item version still captures the construct. A digital-only survey excludes people without reliable internet. The CHES-level judgment preserves measurement quality while respecting literacy, language, disability access, culture, burden, and confidentiality. Burden is part of quality: the shorter valid tool that participants actually complete beats the longer one they abandon.

SMART Objectives Drive Indicators

Indicators flow directly from well-written objectives, which on the CHES exam follow the SMART standard: Specific, Measurable, Achievable, Relevant, and Time-bound. An objective such as "By the end of the six-week class, at least 75% of enrolled adults will correctly demonstrate four of five inhaler steps on a return-demonstration checklist" hands you the indicator (checklist score), the target (75%), and the timing (end of week six). A vague objective such as "improve asthma management" supplies no indicator at all.

When a scenario shows a fuzzy objective and a clean one, the measurable indicator belongs to the SMART version; the exam rewards selecting the objective that already specifies how success will be observed.

Levels of Measurement

The level of measurement constrains both the indicator and the later analysis, so it appears in Area IV items.

LevelDescriptionExampleTypical summary
NominalUnordered categoriesInsurance typeFrequencies, mode
OrdinalRanked, unequal gapsLikert agreement 1-5Median, frequencies
IntervalEqual gaps, no true zeroTemperature in FMean, standard deviation
RatioEqual gaps, true zeroSessions attended, weightMean, ratios

Knowing the level prevents errors such as averaging a nominal variable. You can count how many participants chose each insurance type, but you cannot compute a meaningful "average insurance type." Ratio data such as servings of vegetables or number of referrals support the widest range of statistics because zero is meaningful.

Sensitivity, Specificity, and Screening Indicators

When the indicator is a screening or test result, two properties decide its usefulness. Sensitivity is the share of people who truly have the condition that the test correctly flags (true positive rate); a sensitive test rarely misses cases. Specificity is the share of people who truly do not have the condition that the test correctly clears (true negative rate); a specific test rarely raises false alarms. A community blood-pressure screening tuned for high sensitivity will catch nearly everyone with hypertension but will refer some people who turn out fine, creating follow-up burden.

The exam may describe a screening tool and ask which property matters most for the program goal: when missing a case is dangerous, prioritize sensitivity; when false positives are costly or alarming, weigh specificity. These are reliability- and validity-adjacent concepts applied to measurement of a health condition.

Why Reliability and Validity Both Matter Together

Picture a target. A measure that is reliable but not valid clusters its shots tightly but off-center: consistent and wrong. A measure that is valid but not reliable scatters around the bullseye on average but is erratic on any single use, which is rarely achievable in practice because random noise undermines accuracy. The practical lesson for evaluation is to prefer instruments with published reliability and validity evidence in a population similar to yours, then confirm they still fit your participants' language and reading level before deciding the measure is trustworthy for your decision.

Reading the Verb

In applied items, scan the objective's verb. Words such as list, identify, demonstrate, attend, refer, adopt, and reduce point to different indicators. Demonstrate implies an observation checklist, attend implies an attendance log, refer implies a referral record, and reduce implies a before-and-after rate. Then ask whether the measure would yield the same meaning across staff, sites, languages, and time. A convenient but poorly matched measure can make an effective program look weak or an ineffective program look successful, which is why aligning the indicator to the verb is a recurring exam skill.

Test Your Knowledge

A smoking cessation objective targets increased refusal self-efficacy. Which indicator best aligns with the objective?

A
B
C
D
Test Your Knowledge

Two observers independently score the same food demonstration checklist and compare agreement. Which measurement property is being assessed?

A
B
C
D
Test Your Knowledge

Why is an operational definition important in evaluation?

A
B
C
D
Test Your Knowledge

A bathroom scale consistently reads exactly five pounds too high. What does this illustrate?

A
B
C
D