4.2 Data Presentation and Descriptive Statistics
Key Takeaways
- The mean is sensitive to outliers; the median is the preferred measure of central tendency for skewed data such as length of stay or charges.
- Nominal and ordinal data are categorical; interval and ratio data are continuous, and only ratio data has a true zero allowing meaningful ratios.
- A histogram displays a frequency distribution of continuous data with adjacent bars; a bar chart compares discrete categories with separated bars.
- A line graph shows trends over time, a pie chart shows parts of a whole, and a scatter plot shows the relationship between two continuous variables.
- The range and standard deviation measure dispersion; standard deviation describes how far values typically fall from the mean.
Measures of Central Tendency
The three measures of central tendency summarize a distribution with a single typical value:
- Mean — the arithmetic average (sum of values / count). It uses every value, which makes it sensitive to outliers.
- Median — the middle value when data are ordered (the average of the two middle values if n is even). It is resistant to outliers, so it is preferred for skewed data.
- Mode — the most frequently occurring value; the only measure usable for nominal data, and a distribution can have no mode or several.
When to use which: for symmetric, continuous data the mean is ideal. For skewed healthcare data — length of stay, age, charges — a few very large values pull the mean upward, so the median is the more honest center. The mode is best for categorical counts (most common discharge disposition).
The relationship among the three signals the shape of a distribution. In a perfectly symmetric distribution, mean = median = mode. In a right-skewed (positively skewed) distribution — the typical pattern for cost and stay data — the long right tail pulls the mean to the right of the median, so mean > median > mode. In a left-skewed distribution the reverse holds. Knowing that the mean is dragged toward the tail tells you instantly which measure an outlier-laden data set should report.
Measures of Variation
Central tendency alone hides how spread out the data are, so report a measure of dispersion too.
- Range = highest value − lowest value. Simple, but driven entirely by the two extremes, so one outlier distorts it.
- Variance = the average of the squared deviations from the mean.
- Standard deviation (SD) = the square root of the variance; it expresses typical distance from the mean in the original units, which makes it the most useful dispersion measure.
In a roughly normal (bell-shaped) distribution, about 68% of values fall within 1 SD of the mean, 95% within 2 SD, and 99.7% within 3 SD (the empirical rule). A larger SD means more variability; a small SD means values cluster tightly around the mean.
Two more dispersion ideas appear on the exam. The interquartile range (IQR) is the spread of the middle 50% of values (the 75th percentile minus the 25th percentile); like the median it resists outliers and pairs well with a box plot. The coefficient of variation expresses the SD as a percentage of the mean, allowing you to compare variability between data sets measured on different scales. When a question reports two units with the same mean but different SDs, the unit with the larger SD is the more variable and less predictable.
Levels of Measurement (Data Types)
How data are measured determines which statistics and which charts are valid:
| Scale | Description | Example | Center stat |
|---|---|---|---|
| Nominal | Named categories, no order | Sex, blood type, MS-DRG | Mode |
| Ordinal | Ordered categories, unequal gaps | Pain 0–10, satisfaction (poor→excellent) | Median |
| Interval | Ordered, equal gaps, no true zero | Temperature in °F | Mean |
| Ratio | Equal gaps with a true zero | Age, LOS, charges, lab values | Mean |
Nominal and ordinal scales are categorical (qualitative); interval and ratio are continuous (quantitative). Only ratio data has a meaningful zero, so only ratio data supports true ratios (a 10-day stay is twice a 5-day stay; 60°F is not "twice as hot" as 30°F).
Choosing the Right Display
Match the visual to the data type. A first rule: use a table when exact values matter and a graph when the pattern or comparison matters.
| Display | Best for |
|---|---|
| Frequency distribution table | Exact counts/percentages by category |
| Bar chart | Comparing discrete categories (bars separated by gaps) |
| Histogram | A frequency distribution of continuous data (adjacent bars, no gaps) |
| Pie chart | Parts of a single whole (proportions that sum to 100%) |
| Line graph | A trend over time |
| Scatter plot (diagram) | Relationship/correlation between two continuous variables |
The most-tested distinction is bar chart vs histogram: a bar chart's gaps signal separate categories (nominal/ordinal), while a histogram's touching bars signal a continuous variable split into ranges (class intervals). Use a pie chart only when categories are mutually exclusive and exhaustive; with many slices a bar chart reads better.
A few related displays round out the topic. A frequency polygon is a line connecting the midpoints of histogram bars, useful for comparing two distributions on one graph. A Pareto chart is a bar chart ordered from most to least frequent with a cumulative line, supporting the 80/20 focus used in performance improvement. A stem-and-leaf plot shows the distribution while preserving the actual data values.
When deciding table vs graph: choose a table for precise reference values and many variables, and a graph to reveal a trend, comparison, or relationship at a glance — never both for the same point in the same report, which only duplicates effort.
A report on patient length of stay shows most stays are 2–4 days, but a handful of patients stayed over 90 days. Which measure of central tendency best represents the typical stay?
Which chart is most appropriate for displaying a frequency distribution of a continuous variable such as patient age grouped into intervals?
Patient satisfaction recorded as poor, fair, good, or excellent is an example of which level of measurement?