Biostatistics, Epidemiology, and Population Health

Key Takeaways

Sensitivity and specificity are properties of the test in a studied population, while PPV and NPV move strongly with disease prevalence.
Likelihood ratios convert pretest odds to posttest odds and are most useful when a vignette asks how much a result changes probability.
Risk ratio, odds ratio, absolute risk reduction, relative risk reduction, and NNT answer different questions even when built from the same event counts.
A confidence interval that crosses 1 for ratios or 0 for differences is not statistically significant at the corresponding alpha level.
Confounding distorts an exposure-outcome association; effect modification means the association truly differs across strata.
Screening and population health questions often combine epidemiology with ethics, prevention, equity, and quality-safety systems thinking.

Last updated: June 2026

Biostatistics Reasoning Map

Vignette clue	Reasoning move	Common trap
Two-by-two table	Define disease status, test status, and requested denominator	Mixing sensitivity with positive predictive value
Study result and confidence interval	Interpret precision, statistical significance, and clinical effect size	Calling a small p value clinically important by itself
Bias or confounding clue	Name selection, measurement, recall, lead-time, length-time, or effect modification	Fixing every design problem with randomization

Biostatistics and epidemiology on Step 1 reward disciplined setup. Before calculating, name the unit of analysis, the time frame, and the denominator. Incidence is new cases over time among people at risk; prevalence is all existing cases at one point or period. Prevalence rises when incidence rises or survival lengthens, and falls when cure or death removes cases. A chronic disease with improved survival can become more prevalent even if incidence is unchanged. Attack rate is an incidence proportion during an outbreak.

Mortality rate counts deaths in a population over time; case fatality rate counts deaths among diagnosed cases. These distinctions are often the entire question.

For diagnostic tests, write the 2 x 2 table in the same orientation every time. Disease present with test positive is true positive; disease present with test negative is false negative; disease absent with test positive is false positive; disease absent with test negative is true negative. Sensitivity is TP divided by TP plus FN, the probability of a positive test among people with disease. Specificity is TN divided by TN plus FP, the probability of a negative test among people without disease. A highly sensitive test is useful for ruling out disease when negative because few diseased patients are missed.

A highly specific test is useful for ruling in disease when positive because few healthy patients test positive. This rule works best when the test threshold and clinical setting match the vignette.

Positive predictive value is TP divided by all positive tests. Negative predictive value is TN divided by all negative tests. PPV increases as prevalence increases because a positive result is more likely to represent true disease in a high-prevalence population. NPV increases as prevalence decreases because a negative result is more likely to be true in a low-prevalence population. Sensitivity and specificity are less prevalence-dependent, but they can change with spectrum bias when the studied patients have more severe, obvious, or atypical disease than the target population.

Raising a test cutoff usually increases specificity and PPV but decreases sensitivity and NPV; lowering the cutoff does the opposite. ROC curves display the sensitivity-specificity tradeoff across thresholds, and a larger area under the curve means better discrimination.

Likelihood ratios ask how much a result changes probability. LR positive equals sensitivity divided by 1 minus specificity. LR negative equals 1 minus sensitivity divided by specificity. Convert probability to odds, multiply by the LR, then convert odds back to probability if a numerical posttest probability is required. Odds equal probability divided by 1 minus probability; probability equals odds divided by 1 plus odds. A strong positive LR makes disease much more likely after a positive test, while a small negative LR makes disease much less likely after a negative test.

Unlike PPV and NPV, likelihood ratios are less directly changed by prevalence, but the same LR produces different posttest probabilities when pretest probabilities differ.

Risk measures require attention to study design. In a cohort study or randomized trial, you can calculate incidence in exposed and unexposed groups, so relative risk is natural: risk in exposed divided by risk in unexposed. In a case-control study, investigators start with disease status and look back for exposure, so the odds ratio is usually the association measure. For rare diseases, the odds ratio approximates the relative risk. Attributable risk, or absolute risk reduction when an intervention lowers events, is the simple difference between event rates.

Relative risk reduction is ARR divided by baseline risk. Number needed to treat is 1 divided by ARR, using ARR as a decimal and rounding up to the next whole patient. Number needed to harm is calculated the same way for excess adverse events. A treatment can have an impressive relative risk reduction but small absolute benefit if baseline risk is low.

P values and confidence intervals test precision and compatibility with a null hypothesis. A p value is the probability of observing data at least as extreme as the study result if the null hypothesis were true; it is not the probability that the null is true. Alpha is the probability of type I error, a false positive. Beta is the probability of type II error, a false negative. Power equals 1 minus beta and improves with larger sample size, larger effect size, lower variability, and sometimes a higher alpha.

For relative measures such as RR, OR, and hazard ratio, a confidence interval crossing 1 is not statistically significant at the corresponding alpha. For differences such as ARR or mean difference, a confidence interval crossing 0 is not significant. A narrow interval suggests more precision, not necessarily more clinical importance.

Bias is systematic error. Selection bias occurs when the enrolled sample differs from the target population in a way related to the outcome. Volunteer bias is a selection subtype in which participants are unusually health-conscious. Berkson bias can occur in hospital-based case-control studies when hospitalization itself is related to exposure and disease. Recall bias appears when cases remember exposures differently than controls. Observer bias occurs when measurement is influenced by knowledge of exposure or disease; blinding helps reduce it.

Lead-time bias makes survival from diagnosis look longer after earlier detection without changing the time of death. Length-time bias overrepresents slowly progressive disease in screening programs. Overdiagnosis detects abnormalities that would never have caused symptoms or death.

Confounding is a third variable associated with both exposure and outcome but not on the causal pathway. It creates or hides an association. Randomization, restriction, matching, stratification, and multivariable adjustment can reduce confounding. Effect modification is different: the exposure truly has different effects in different groups, such as a medication reducing events in one genotype group but not another. On Step 1, if stratified analyses show the association disappears in every stratum, think confounding.

If the association is present in one stratum and absent or reversed in another, think effect modification.

Study design clues are high-yield because they determine both inference and statistics. Randomized controlled trials best support causality and reduce confounding, especially with allocation concealment and blinding. Cohort studies follow exposed and unexposed groups to compare future disease. Case-control studies compare past exposures among cases and controls and are efficient for rare diseases or long latency. Cross-sectional studies measure exposure and outcome at one time and can estimate prevalence, but temporal order is often unclear.

Ecologic studies use group-level data and risk ecological fallacy when group associations are assumed to apply to individuals. Meta-analysis combines studies but inherits the quality and heterogeneity of included work.

Screening is justified when disease is important, has a detectable asymptomatic phase, has an acceptable test, and early treatment improves patient-important outcomes. Screening tests are typically sensitive; confirmatory tests are typically specific. Prevention language is another common population health bridge: primary prevention prevents disease before it occurs, secondary prevention detects early disease, tertiary prevention reduces complications, and quaternary prevention avoids unnecessary intervention. Quality and safety questions often ask for a system-level response rather than blame.

Use root cause analysis after serious events, plan-do-study-act cycles for small tests of change, standardized handoffs to reduce communication failures, medication reconciliation at transitions, and reporting systems for near misses. Public health vignettes may add isolation, vaccination, contact tracing, environmental exposure control, or risk communication. The Step 1 move is to identify the measure, bias, or prevention level before being pulled into clinical details.

Test Your Knowledge

A new rapid blood test for Disease X is evaluated in 1000 patients from a primary care clinic. Disease X is present in 100 patients. The test is positive in 90 patients with disease and in 180 patients without disease. Which statement best describes the expected interpretation of this test in this clinic population?

The positive predictive value is 90 percent because 90 of 100 diseased patients test positive.

The positive predictive value is approximately 33 percent because 90 of 270 positive tests are true positives.

The negative predictive value is approximately 80 percent because 720 of 900 nondiseased patients test negative.

The specificity is 90 percent because 90 of 100 diseased patients test positive.

Test Your Knowledge

In a randomized trial of a drug to prevent recurrent nephrolithiasis, 12 of 400 treated patients and 30 of 400 placebo patients develop a recurrent stone over 2 years. Which value is closest to the number needed to treat to prevent one recurrent stone over 2 years?

Test Your Knowledge

A case-control study reports that adults who drink more than 4 cups of coffee per day have higher odds of pancreatic cancer. Investigators later find that heavy coffee drinkers in the study were much more likely to smoke cigarettes, and the coffee-cancer association disappears after stratifying by smoking status. Which concept best explains the original association?

Confounding

Effect modification

Lead-time bias

Length-time bias

Up Next

Final Review, Exam-Day Risk Control, and Error Analysis

Continue learning

USMLE Step 1

USMLE Step 1

Biostatistics, Epidemiology, and Population Health

Key Takeaways

Biostatistics Reasoning Map

USMLE Step 1

1Chapter 1: Step 1 Exam Roadmap and Official Sources

2Chapter 2: Question Method and Study System

3Chapter 3: Foundational Science Mechanisms

4Chapter 4: Blood, Lymphoreticular, and Immune Systems

5Chapter 5: Behavioral Health, Nervous System, and Special Senses

6Cardiovascular System

7Respiratory, Renal, and Urinary Systems

8Gastrointestinal System and Nutrition

9Reproductive, Endocrine, and Human Development

10Musculoskeletal, Skin, and Connective Tissue

11Multisystem Processes and Infectious Disease

12Biostatistics, Population Health, and Final Review

USMLE Step 1

Biostatistics, Epidemiology, and Population Health

Key Takeaways

Biostatistics Reasoning Map