5.5 Analysis, Interpretation, and Data Use
Key Takeaways
- Analysis must match the question, design, measurement level, sample size, and intended audience, and should be planned before data collection.
- Descriptive statistics summarize what happened; inferential statistics estimate whether a pattern likely exceeds chance.
- Qualitative analysis identifies meaning through systematic coding, constant comparison, and theme development, not a single loud comment.
- Interpretation must address limitations, practical significance, equity implications, and the next-step decision.
Turning Data Into Decisions
Analysis should be planned before data collection. If the question asks whether knowledge improved from pretest to posttest, the evaluator needs matched participant data and a comparison plan. If the question asks why attendance fell, open-ended comments or interviews need systematic coding. Collecting data without an analysis plan often yields information that is hard to use.
Descriptive Statistics
Descriptive statistics summarize a dataset and answer most CHES-level decisions.
- Frequencies and percentages describe attendance, completion, demographics, and correct responses.
- Measures of central tendency: the mean suits roughly symmetric numeric data; the median resists outliers and fits skewed data such as income or wait times; the mode fits categorical data.
- Measures of spread: range and standard deviation describe how dispersed scores are.
Inferential Statistics
Inferential statistics estimate whether an observed difference or association likely reflects more than random variation. A paired t-test fits the same participants' pretest and posttest scores; an independent t-test compares intervention and comparison groups; a chi-square test compares categorical outcomes such as screened versus not screened. The exam rarely demands calculation, but it may ask which analysis matches the design. A p-value below 0.05 is the conventional threshold for statistical significance, meaning the result would be unlikely if there were truly no effect.
| Situation | Reasonable analysis |
|---|---|
| Same people, pre vs post score | Paired comparison |
| Two separate groups, mean score | Independent comparison |
| Counts across categories | Chi-square |
| Describe one variable | Frequencies, mean or median |
Statistical vs Practical Significance
Practical significance asks whether a change is large enough to matter in the real world. A statistically significant gain can be too small to justify program cost, while a non-significant pilot result may still suggest promise if the pattern is consistent and supported by participant feedback. Interpretation should weigh sample size, measure quality, implementation fidelity, context, and meaning to the priority population.
Qualitative Analysis
Qualitative analysis must be systematic: read transcripts or notes, develop codes, compare responses across participants (constant comparison), identify recurring themes, and use quotations sparingly to illustrate meaning. Strong reporting explains how data were collected, who participated, how themes emerged, and what limitations apply. It never treats the loudest comment as the whole story.
Equity-Focused Interpretation
Always ask who benefited, who was not reached, and whether averages hide differences. A program can raise overall knowledge while failing participants with limited English proficiency. A coalition can hit its attendance target while excluding evening-shift workers. Disaggregate findings by subgroup and treat gaps as signals for adaptation, not as data to delete.
Reading a Simple Result
Exam items occasionally show a small data display and ask for interpretation rather than calculation. Suppose a pretest-posttest table shows mean knowledge rising from 12 to 16 on a 20-item scale, with the change reported as statistically significant at p = 0.02. The correct reading has three parts. First, the direction and size: scores rose four points, or 20 percentage points, which is a meaningful gain on this scale. Second, the statistical claim: a p-value of 0.02 is below the conventional 0.05 threshold, so the gain is unlikely to be due to chance alone in this sample.
Third, the caveats: a single group with no comparison cannot prove the program caused the gain, and a significant result on a tiny sample can still be fragile. Resist distractors that say a low p-value proves causation or that a significant result is automatically important.
Rates, Proportions, and Denominators
Many Area IV items hinge on choosing the right denominator. A proportion divides part by a whole at one time (40 of 60 enrollees attended, or 67%). A rate adds a time element and a population at risk (12 new diabetes diagnoses per 1,000 adults per year). The classic error is comparing raw counts across groups of different sizes: clinic A's 50 referrals and clinic B's 30 referrals say nothing until you know each clinic served 500 versus 150 patients, making B's referral rate higher. When an exam item presents counts from unequal groups and asks for a fair comparison, convert to a rate or percentage first.
Watch also for the difference between incidence (new cases over a period) and prevalence (all existing cases at a point), because a program that prevents new cases moves incidence first.
Common Misinterpretations to Avoid
- Treating a non-significant pilot result as proof the program failed, when the sample may simply be too small to detect a real effect.
- Treating statistical significance as proof of practical importance or of causation.
- Generalizing from a convenience sample to an entire population.
- Confusing correlation in cross-sectional data with cause and effect.
- Reporting only the overall mean and missing a subgroup that did not benefit.
Closing the Loop
Data use closes the evaluation loop. Findings may support program improvement, funding decisions, staff training, partner communication, policy advocacy, or discontinuation of an ineffective strategy. Return to the original objective: if it was modest, avoid sweeping claims; if the data reveal a delivery problem, recommend implementation changes before judging outcomes; if data quality is weak, state what can still be learned and what to measure next. This discipline keeps evaluation useful rather than decorative, and it mirrors what NCHEC expects of an entry-level specialist: a person who lets evidence, not hope, drive the recommendation.
The same 40 participants complete a pretest and posttest knowledge scale. Which analysis logic best fits the design?
What does practical significance mean?
An average score improved, but participants with limited English proficiency showed little change. What is the best interpretation step?
Wait times at a clinic are heavily skewed by a few very long visits. Which measure of central tendency best summarizes a typical wait?