7.5 Descriptive, Inferential, and Predictive Statistics
Key Takeaways
- Descriptive statistics summarize observed data; inferential statistics estimate what results suggest beyond the sample.
- The appropriate statistic depends on the measurement scale (nominal, ordinal, interval, ratio), the design, and the research question.
- Correlation describes association, regression predicts outcomes, and group-comparison tests (t test, ANOVA) evaluate mean differences under assumptions.
- Type I error (alpha), Type II error (beta), power (1-beta), and confidence intervals describe the uncertainty around statistical decisions.
Choose the Statistic That Fits the Question
Statistics give structure to evidence. Descriptive statistics summarize what was observed; inferential statistics estimate what the data suggest about a broader population; predictive statistics estimate outcomes from one or more variables. EPPP items rarely require long calculations, but they reliably require matching a statistic to the design and variables.
Start with measurement level. Nominal variables are unordered categories (diagnosis group). Ordinal variables have rank but unequal intervals (Likert ranks). Interval variables have equal intervals but no true zero (temperature in Celsius). Ratio variables add a meaningful zero (reaction time). The statistic must respect the scale.
| Question type | Common statistic | Key cue |
|---|---|---|
| Summarize central tendency | Mean / median / mode | Use median when skew or outliers distort the mean |
| Summarize variability | SD, variance, range, IQR | SD = spread around the mean; variance = SD squared |
| Associate two continuous variables | Pearson r | Linear relationship, interval/ratio data |
| Associate ranked variables | Spearman rho | Ordinal or non-linear monotonic data |
| Associate two categorical variables | Chi-square | Frequencies/counts in categories |
| Compare two means | t test (independent or paired) | Paired = related scores (pre/post in same people) |
| Compare 3+ means | ANOVA | Omnibus result needs post-hoc tests |
| Predict an outcome | Regression | One or more predictors estimate a criterion |
The mean is pulled by outliers; the median better represents skewed distributions; the mode is the most frequent value and suits nominal data. A z score expresses how far a value lies from the mean in standard-deviation units. In a normal distribution, about 68% of scores fall within +/-1 SD, 95% within +/-2 SD, and 99.7% within +/-3 SD, a fact useful for interpreting percentiles and standardized scores.
Association, Comparison, and Inferential Error
Correlation coefficients run from -1.00 to +1.00; the sign shows direction and the magnitude shows strength. A negative r means higher values on one variable accompany lower values on the other. The coefficient of determination (r-squared) gives the proportion of variance shared, so r = .50 explains only 25% of the variance. Correlation does not prove causation. Regression uses predictors to estimate a criterion, but a regression model is still limited by design, measurement error, and omitted variables; multicollinearity among predictors destabilizes the coefficients.
Group-comparison choices depend on design. An independent-samples t test compares two separate groups; a paired-samples t test compares related scores (pretest vs. posttest in the same people). ANOVA compares means across three or more groups or factors; a significant omnibus F says not all means are equal but does not identify which pairs differ, which is why post-hoc tests (Tukey, Bonferroni) follow. A factorial ANOVA also tests interactions between factors.
Inferential statistics live with error:
- Type I error (alpha) — rejecting a true null hypothesis (a false positive); alpha is conventionally set at .05.
- Type II error (beta) — failing to reject a false null hypothesis (a false negative).
- Power (1 - beta) — the probability of detecting an effect that truly exists; a common target is .80.
- Power increases with larger samples, larger true effects, lower measurement error, a larger alpha, and one-tailed tests.
Confidence intervals communicate the precision of an estimate. A 95% CI is the range that would capture the true parameter in 95% of repeated samples; a narrow interval signals precision, and a CI that excludes the null value (0 for a difference, 1 for an odds ratio) corresponds to a significant result. CIs are exam-friendly because they shift attention from a bare yes/no significance decision to the plausible range of true values. When a statistics stem feels dense, translate it: is the researcher summarizing, associating, comparing, predicting, classifying, or estimating precision?
Once the task is clear, the correct statistic is the one that respects the variable type and design.
Distributions, Assumptions, and Nonparametric Alternatives
Many inferential tests assume the data are roughly normal, that group variances are similar (homogeneity of variance), and that observations are independent. The EPPP tests recognition of when those assumptions fail and what to do. Skew describes asymmetry: in a positively (right) skewed distribution the long tail points to the high end and the mean exceeds the median (income is the classic example); in a negatively (left) skewed distribution the mean falls below the median. Kurtosis describes peakedness.
When distributions are badly skewed, are ordinal, or have small samples, parametric tests can mislead, and a nonparametric alternative is preferred:
| Parametric test | Nonparametric counterpart | When to switch |
|---|---|---|
| Independent t test | Mann-Whitney U | Ordinal data or non-normal small samples |
| Paired t test | Wilcoxon signed-rank | Ordinal or skewed paired data |
| One-way ANOVA | Kruskal-Wallis | Three+ groups, non-normal |
| Pearson r | Spearman rho | Ranked or monotonic, non-linear data |
Nonparametric tests use ranks rather than raw values, so they are robust to outliers and do not assume normality, but they generally have somewhat lower power when parametric assumptions actually hold. The exam-ready rule: scale and distribution drive the choice. Categorical outcomes point to chi-square or logistic regression; continuous, normal outcomes point to t tests, ANOVA, or linear regression; ordinal or skewed data point to the rank-based alternatives above.
Statistical significance is not effect magnitude. A result with p = .049 is barely below the .05 threshold and says only that the data are unlikely under the null hypothesis; it does not say the effect is large or important. A p value also depends heavily on sample size, so with thousands of participants a trivial difference can be "significant." This is exactly why the next section pairs significance with effect size and clinical significance. A further trap is misinterpreting a non-significant result as proof of no effect; absence of evidence is not evidence of absence, especially when power is low.
When a statistics stem offers an option that overstates a p value ("this proves the treatment works") alongside a measured option ("the difference is statistically significant but the effect is small"), the measured, magnitude-aware option is almost always the credited answer.
A researcher wants to test whether three different therapy formats produce different mean anxiety scores. Which statistic is most appropriate?
A study is underpowered. Which error is it most at risk of committing?
Two variables correlate at r = .40. How much of the variance in one is shared with the other?