5.4 Data Collection and Management
Key Takeaways
- Data collection methods should match the evaluation question, participant burden, literacy, timing, and available resources.
- Common methods include surveys, interviews, focus groups, observation, document review, administrative data, and biometric or environmental measures.
- Data management includes coding, secure storage, quality checks, de-identification, and a plan for missing data.
- Confidentiality and informed participation are essential when collecting sensitive health information.
Collecting Useful and Respectful Data
Data collection should start with the evaluation question, not with the easiest tool. A survey may work for knowledge and attitudes. Observation may be better for skills or fidelity. Administrative records may be efficient for attendance, referrals, immunization status, or service use. Interviews and focus groups can explain why a program did or did not fit participants.
Surveys are common because they can reach many people and produce comparable responses. They require careful wording, reading-level review, translation checks, accessible formats, and pilot testing. Leading questions, double-barreled items, unclear recall periods, and response options that do not fit participants can damage data quality.
Interviews and focus groups are useful for depth. Interviews may fit sensitive topics because participants can speak privately. Focus groups may reveal shared norms, language, and barriers, but they are less appropriate when confidentiality cannot be managed. A skilled facilitator uses neutral probes, keeps discussion respectful, and avoids turning the session into education or persuasion.
Observation can assess behaviors, settings, or implementation fidelity. A structured observation checklist is stronger than casual impressions. Observers need training, definitions, and practice scoring. Observation may introduce privacy concerns, so participants and sites should understand what is being observed and how results will be used.
Existing data can reduce burden. Clinic records, school attendance logs, referral systems, public surveillance data, and policy documents may answer some questions without asking participants to repeat information. However, existing data may have missing fields, inconsistent definitions, delayed availability, or access restrictions. The evaluator should confirm data quality before relying on it.
Data management protects both accuracy and people. A basic plan should specify who collects data, where files are stored, how identifiers are separated, how paper forms are locked, how electronic files are password protected, how errors are checked, and when records are destroyed. De-identification is especially important when reporting small groups where individuals could be recognized.
Missing data should be handled honestly. Ignoring missing responses can bias results if the missingness is patterned. For example, people with lower literacy may skip written survey items, or people with transportation barriers may miss posttests. The CHES exam may ask for the best response: document the missingness, examine patterns, improve collection procedures, and avoid overstating conclusions.
A strong data collection plan also anticipates workflow. Decide when participants will complete forms, who can answer questions, where completed forms go, and what happens if someone arrives late or leaves early. In community settings, small workflow details affect response rates and confidentiality. The best CHES answer often selects the method that is good enough for the question and realistic for the setting.
Data collectors should also be trained to respond neutrally. Explaining instructions is appropriate; coaching participants toward a preferred answer is not. Neutral procedures improve consistency and reduce social desirability pressure, especially when participants know the program staff personally.
Scenario Review Checklist
- Identify the relevant CHES Area of Responsibility.
- Locate the program stage in the scenario.
- Match the answer to evidence, stakeholders, and ethics.
- Reject choices that are premature, unsupported, or outside scope.
A program wants to know whether facilitators used all required role-play steps. Which data collection method is most direct?
Which survey item has the clearest flaw?
Why might an evaluator use existing clinic records instead of a new participant survey?