7.5 Descriptive, Inferential, and Predictive Statistics

Key Takeaways

  • Descriptive statistics summarize observed data; inferential statistics estimate what results suggest beyond the sample.
  • The appropriate statistic depends on the measurement scale (nominal, ordinal, interval, ratio), the design, and the research question.
  • Correlation describes association, regression predicts outcomes, and group-comparison tests (t test, ANOVA) evaluate mean differences under assumptions.
  • Type I error (alpha), Type II error (beta), power (1-beta), and confidence intervals describe the uncertainty around statistical decisions.
Last updated: June 2026

Choose the Statistic That Fits the Question

Statistics give structure to evidence. Descriptive statistics summarize what was observed; inferential statistics estimate what the data suggest about a broader population; predictive statistics estimate outcomes from one or more variables. EPPP items rarely require long calculations, but they reliably require matching a statistic to the design and variables.

Start with measurement level. Nominal variables are unordered categories (diagnosis group). Ordinal variables have rank but unequal intervals (Likert ranks). Interval variables have equal intervals but no true zero (temperature in Celsius). Ratio variables add a meaningful zero (reaction time). The statistic must respect the scale.

Question typeCommon statisticKey cue
Summarize central tendencyMean / median / modeUse median when skew or outliers distort the mean
Summarize variabilitySD, variance, range, IQRSD = spread around the mean; variance = SD squared
Associate two continuous variablesPearson rLinear relationship, interval/ratio data
Associate ranked variablesSpearman rhoOrdinal or non-linear monotonic data
Associate two categorical variablesChi-squareFrequencies/counts in categories
Compare two meanst test (independent or paired)Paired = related scores (pre/post in same people)
Compare 3+ meansANOVAOmnibus result needs post-hoc tests
Predict an outcomeRegressionOne or more predictors estimate a criterion

The mean is pulled by outliers; the median better represents skewed distributions; the mode is the most frequent value and suits nominal data. A z score expresses how far a value lies from the mean in standard-deviation units. In a normal distribution, about 68% of scores fall within +/-1 SD, 95% within +/-2 SD, and 99.7% within +/-3 SD, a fact useful for interpreting percentiles and standardized scores.

Association, Comparison, and Inferential Error

Correlation coefficients run from -1.00 to +1.00; the sign shows direction and the magnitude shows strength. A negative r means higher values on one variable accompany lower values on the other. The coefficient of determination (r-squared) gives the proportion of variance shared, so r = .50 explains only 25% of the variance. Correlation does not prove causation. Regression uses predictors to estimate a criterion, but a regression model is still limited by design, measurement error, and omitted variables; multicollinearity among predictors destabilizes the coefficients.

Group-comparison choices depend on design. An independent-samples t test compares two separate groups; a paired-samples t test compares related scores (pretest vs. posttest in the same people). ANOVA compares means across three or more groups or factors; a significant omnibus F says not all means are equal but does not identify which pairs differ, which is why post-hoc tests (Tukey, Bonferroni) follow. A factorial ANOVA also tests interactions between factors.

Inferential statistics live with error:

  • Type I error (alpha) — rejecting a true null hypothesis (a false positive); alpha is conventionally set at .05.
  • Type II error (beta) — failing to reject a false null hypothesis (a false negative).
  • Power (1 - beta) — the probability of detecting an effect that truly exists; a common target is .80.
  • Power increases with larger samples, larger true effects, lower measurement error, a larger alpha, and one-tailed tests.

Confidence intervals communicate the precision of an estimate. A 95% CI is the range that would capture the true parameter in 95% of repeated samples; a narrow interval signals precision, and a CI that excludes the null value (0 for a difference, 1 for an odds ratio) corresponds to a significant result. CIs are exam-friendly because they shift attention from a bare yes/no significance decision to the plausible range of true values. When a statistics stem feels dense, translate it: is the researcher summarizing, associating, comparing, predicting, classifying, or estimating precision?

Once the task is clear, the correct statistic is the one that respects the variable type and design.

Distributions, Assumptions, and Nonparametric Alternatives

Many inferential tests assume the data are roughly normal, that group variances are similar (homogeneity of variance), and that observations are independent. The EPPP tests recognition of when those assumptions fail and what to do. Skew describes asymmetry: in a positively (right) skewed distribution the long tail points to the high end and the mean exceeds the median (income is the classic example); in a negatively (left) skewed distribution the mean falls below the median. Kurtosis describes peakedness.

When distributions are badly skewed, are ordinal, or have small samples, parametric tests can mislead, and a nonparametric alternative is preferred:

Parametric testNonparametric counterpartWhen to switch
Independent t testMann-Whitney UOrdinal data or non-normal small samples
Paired t testWilcoxon signed-rankOrdinal or skewed paired data
One-way ANOVAKruskal-WallisThree+ groups, non-normal
Pearson rSpearman rhoRanked or monotonic, non-linear data

Nonparametric tests use ranks rather than raw values, so they are robust to outliers and do not assume normality, but they generally have somewhat lower power when parametric assumptions actually hold. The exam-ready rule: scale and distribution drive the choice. Categorical outcomes point to chi-square or logistic regression; continuous, normal outcomes point to t tests, ANOVA, or linear regression; ordinal or skewed data point to the rank-based alternatives above.

Statistical significance is not effect magnitude. A result with p = .049 is barely below the .05 threshold and says only that the data are unlikely under the null hypothesis; it does not say the effect is large or important. A p value also depends heavily on sample size, so with thousands of participants a trivial difference can be "significant." This is exactly why the next section pairs significance with effect size and clinical significance. A further trap is misinterpreting a non-significant result as proof of no effect; absence of evidence is not evidence of absence, especially when power is low.

When a statistics stem offers an option that overstates a p value ("this proves the treatment works") alongside a measured option ("the difference is statistically significant but the effect is small"), the measured, magnitude-aware option is almost always the credited answer.

Test Your Knowledge

A researcher wants to test whether three different therapy formats produce different mean anxiety scores. Which statistic is most appropriate?

A
B
C
D
Test Your Knowledge

A study is underpowered. Which error is it most at risk of committing?

A
B
C
D
Test Your Knowledge

Two variables correlate at r = .40. How much of the variance in one is shared with the other?

A
B
C
D