6.1 Descriptive Statistics and Data Displays
Key Takeaways
- AEPA Domain V expects candidates to interpret data displays, not just compute formulas from isolated lists.
- Mean, median, mode, range, interquartile range, and standard deviation answer different questions about center and spread.
- The display type should match the data type: dot plots and histograms show distributions, box plots show five-number summaries, and scatterplots show paired numerical data.
- Teacher-certification items often test whether you can diagnose a student misconception about outliers, bins, scale, or weighted frequency.
Read the Data Before You Calculate
The official AEPA Mathematics profile places statistics inside Domain V, which is listed at 19% of the test score with probability and discrete mathematics. The statistics competency asks you to organize and display data, analyze representations, compare measures of center and variability, and evaluate bias and sampling. That wording matters: a candidate who can compute a mean but cannot explain why a histogram, box plot, or table supports a conclusion is underprepared for a teacher-certification exam.
Start every statistics item by naming the variable, the unit, and the data type. Categorical data belong in bar charts or two-way tables. Numerical data can appear in dot plots, stem-and-leaf plots, histograms, box plots, frequency tables, or scatterplots. A histogram groups values into intervals, so individual data values may be hidden. A dot plot shows individual counts, so it is better for small sets and for seeing repeated values. A box plot compresses data into minimum, first quartile, median, third quartile, and maximum; it is efficient for comparing groups but does not show gaps or clusters inside each quartile.
Center and Spread Work Together
Mean is the arithmetic balance point: add the values and divide by the number of values. It uses every magnitude, so an outlier can pull it. Median is the middle ordered value, so it is resistant to outliers and skew. Mode is the most frequent value; it is useful for categorical data and for repeated numerical values, but a set may have no mode or several modes. Range is maximum minus minimum and is quick but fragile. Interquartile range is Q3 minus Q1 and describes the spread of the middle half. Standard deviation measures typical distance from the mean; it is most informative when the mean is a sensible center.
| Situation | More Useful Summary | Reason |
|---|---|---|
| Symmetric numerical data with no extreme values | Mean and standard deviation | Every value contributes and balance is meaningful |
| Skewed data or an outlier | Median and interquartile range | Resistant summaries keep the center from being pulled |
| Comparing two class score distributions | Center, spread, and shape | Same mean can hide very different consistency |
| Categorical survey result | Mode or proportion | Mean and median usually do not apply |
Worked Example: Frequency Table
Suppose a quiz score table shows score 6 with frequency 2, score 7 with frequency 5, score 8 with frequency 9, score 9 with frequency 3, and score 10 with frequency 1. The total number of students is 20. 8. The median is the average of the 10th and 11th ordered values; both fall in the score-8 group, so the median is 8.
The mode is also 8 because it has the greatest frequency. A common trap is to average the distinct score labels, getting 8, and ignore frequency. That mistake answers a different question: the average of the possible score values, not the average student score.
Display Traps AEPA Can Test
Multiple-choice distractors often come from plausible misreadings. In a histogram, the bar height is frequency, not the value of the data point. In a box plot, the four sections contain equal numbers of observations, not equal lengths on the scale. A long whisker means values are spread out in that quarter of the data, not that more values are there. In a broken-axis graph, a small visual difference may represent a small numerical difference, or a large one, depending on the scale. On a scatterplot, the visual trend describes association; it does not identify center or spread of a single distribution.
For teacher certification, also think about student reasoning. If a student says, "The group with the larger range is always less consistent," the idea is incomplete because one extreme value can inflate range. A better response asks for interquartile range or standard deviation and checks the full distribution. If a student chooses the mean for a salary distribution with one unusually high value, the misconception is not arithmetic; it is selecting a nonresistant measure for a skewed context.
Quick Procedure for Data Display Items
- Identify whether the data are categorical, univariate numerical, or paired numerical.
- Read labels, units, intervals, and scale before using the plotted values.
- Describe shape using symmetry, skew, clusters, gaps, and outliers.
- Match the summary statistic to the shape and context.
- Interpret the result in the original units.
The last step is easy to skip under time pressure. An answer choice that says "standard deviation equals 4" may be numerically correct, but the stronger interpretation is "a typical score is about 4 points from the mean." AEPA items reward that level of statistical language because mathematics teachers need to explain why a statistic is meaningful, not only how it is produced.
A useful final check is to compare statistics with the visual display. If a dot plot has two separated clusters, a single mean may fall in an empty gap and describe no typical observation. If two box plots have the same median but one has a wider interquartile range, the groups differ in consistency even though their centers match. This habit helps you reject answer choices that are numerically possible but visually inconsistent with the representation.
A data set of household incomes is strongly right-skewed because one household earns far more than the rest. Which pair best summarizes a typical income and the spread of the middle half?
A frequency table lists quiz scores and how many students earned each score. What is the safest way to find the mean score?
A box plot has a much longer right whisker than left whisker. Which interpretation is most defensible?