4.4 Research Methods and Epidemiology
Key Takeaways
- A randomized controlled trial (RCT) is the strongest experimental design; cohort, case-control, and cross-sectional studies are observational, not experimental.
- The Institutional Review Board (IRB) protects human subjects under the Common Rule, reviewing risk, informed consent, and protections for vulnerable populations.
- Incidence counts NEW cases over a period; prevalence counts ALL existing cases at a point in time.
- Sensitivity is the ability of a test to correctly identify those with disease (true positives); specificity correctly identifies those without it (true negatives).
- Reliability is consistency of measurement; validity is whether the measurement captures what it intends to capture.
Study Designs
Research designs split into experimental (the investigator assigns an intervention) and observational (the investigator only watches).
| Design | Type | Defining feature |
|---|---|---|
| Randomized controlled trial (RCT) | Experimental | Subjects randomly assigned to intervention vs control |
| Cohort study | Observational | Follow exposed vs unexposed forward in time (prospective) |
| Case-control study | Observational | Start with cases vs controls, look backward for exposure |
| Cross-sectional study | Observational | Measure exposure and outcome at a single point in time |
The RCT is the gold standard because randomization balances confounders. A cohort study is best for measuring incidence and relative risk; a case-control study is efficient for rare diseases because it starts with the outcome. A cross-sectional survey is a snapshot — good for prevalence, but it cannot establish that exposure preceded outcome.
Observational studies can be prospective (follow forward, as in a cohort) or retrospective (look back, as in case-control). The trade-offs are tested directly: a cohort study can measure incidence and follow rare exposures but is costly and slow and is poor for rare diseases; a case-control study is fast and cheap and handles rare diseases but is vulnerable to recall bias and cannot directly compute incidence. Above individual studies sits the systematic review/meta-analysis, which statistically pools many studies and sits at the top of the evidence hierarchy.
Match the question to the design — that is the skill the exam rewards.
IRB and Human-Subjects Protection
Research on human subjects is governed by the Institutional Review Board (IRB) under the federal Common Rule (45 CFR 46). The IRB reviews a study before it begins and may approve, require changes to, or disapprove it.
Its core duties:
- Weigh risks vs benefits and minimize risk to subjects.
- Ensure valid informed consent — voluntary, informed, and documented.
- Add safeguards for vulnerable populations (children, prisoners, pregnant women, cognitively impaired).
- Confirm privacy/confidentiality protections; under HIPAA, the IRB or a privacy board can grant a waiver of authorization to use protected health information (PHI) for research.
The ethical foundations trace to the Belmont Report principles: respect for persons, beneficence, and justice.
Incidence, Prevalence, and Test Accuracy
Two epidemiologic rates are routinely confused:
- Incidence rate = new cases of a disease in a population during a period / population at risk. It measures risk of developing disease.
- Prevalence rate = all existing cases (new + old) at a point in time / total population. It measures burden.
For a diagnostic test, build a 2x2 table of disease vs test result:
| Disease + | Disease − | |
|---|---|---|
| Test + | True positive (TP) | False positive (FP) |
| Test − | False negative (FN) | True negative (TN) |
- Sensitivity = TP / (TP + FN) — detects those with disease; a highly sensitive test rules disease out when negative.
- Specificity = TN / (TN + FP) — correctly clears those without disease; a highly specific test rules disease in when positive.
Two memory hooks the exam relies on: SnNout (a highly Sensitive test that is Negative rules disease out) and SpPin (a highly Specific test that is Positive rules disease in). Sensitivity reads down the disease-positive column (TP and FN), while specificity reads down the disease-negative column (TN and FP) — a classic trap is dividing across rows instead of down columns. Sensitive tests are favored for screening, where missing a case (a false negative) is costly; specific tests are favored for confirmation, where a false alarm (false positive) must be avoided.
Reliability, Validity, and Sampling
Reliability is the consistency of a measurement — does it give the same result on repeat measurement (test-retest) or between raters (inter-rater)? Validity is accuracy — does the instrument measure what it is supposed to? A scale that reads 5 lbs heavy every time is reliable but not valid; a measure can be reliable without being valid, but it cannot be valid without being reliable.
Sampling lets you study a subset and generalize:
- Random (probability) sampling — every member has a known chance of selection (simple random, systematic, stratified, cluster), supporting generalization.
- Non-probability sampling — convenience or purposive selection; cheaper but prone to bias and weaker generalizability.
A larger, well-drawn representative sample reduces sampling error; a biased sample threatens external validity (generalizability) no matter how large it is.
Distinguish the two validities the exam pairs: internal validity is whether the study correctly measured the effect within its own sample (free of confounding and bias), while external validity is whether those results generalize to other populations. Stratified sampling divides the population into subgroups and samples each, ensuring small but important strata are represented; systematic sampling takes every k-th record; cluster sampling draws whole naturally occurring groups. The recurring trap is assuming a big sample fixes bias — it does not.
A nonrepresentative sampling method produces a precisely wrong answer, which is why HIM audits specify a random pull of records rather than a convenience grab. In every epidemiology item, first decide whether the prompt is asking about risk over time (incidence and cohort designs), disease burden at a moment (prevalence and cross-sectional designs), or measurement quality (reliability, validity, sensitivity, and specificity), and the correct choice falls out from there.
Researchers begin with a group of patients who already have a rare cancer and a comparison group without it, then look backward to compare past exposures. Which study design is this?
On January 1 a clinic counts every patient currently living with diabetes in its population. This figure is a measure of:
A bathroom scale that always reads exactly 5 pounds higher than a person's true weight is best described as:
Which federal body must review a research study using identifiable patient data before it begins, to protect human subjects?