3.2 Statistics, Distributions, and Risk Measures
Key Takeaways
- Descriptive statistics summarize central tendency, dispersion, shape, and relative position of return data.
- Standard deviation measures total variability; downside measures (semideviation, shortfall, VaR) isolate adverse outcomes.
- Skewness and kurtosis describe shape and matter because asset returns are typically nonnormal with fat tails.
- The normal distribution lets analysts compute probabilities via z-scores, using 68-95-99.7 and the 1.65/1.96/2.58 critical values.
Statistics, Distributions, and Risk Measures
Statistics turns raw observations (returns, yields, spreads, valuation multiples) into decision-useful information. A disciplined first pass asks four questions: What is typical? How much does it vary? Is the distribution symmetric? How fat are the tails?
Central tendency and position
Measures of central tendency locate the center. The arithmetic mean is the sum over the count. The median is the middle sorted value (the average of the two middle values when N is even) and is robust to outliers. The mode is the most frequent value. When data are skewed by extremes, the median often describes the typical outcome better. Quantiles divide sorted data: quartiles into four parts, quintiles into five, deciles into ten, and percentiles into a hundred. The interquartile range, Q3 minus Q1, underlies box plots.
Dispersion
Range is maximum minus minimum but uses only two points. Mean absolute deviation (MAD) averages absolute deviations from the mean. Variance averages squared deviations; standard deviation is its square root and is reported in return units, making it the workhorse risk measure. Critically, sample variance divides by n - 1 (a degrees-of-freedom adjustment because the sample mean was estimated from the same data), while population variance divides by N. Exam stems explicitly state whether the data are a sample or a population, and the denominator choice changes the answer.
The coefficient of variation (CV) is standard deviation / mean. It measures risk per unit of expected return, so a lower CV is preferred among investments with positive expected returns. CV becomes misleading when the mean is zero or negative.
Shape: skewness and kurtosis
Returns are frequently asymmetric. Positive (right) skew has a long right tail and pulls the mean above the median; negative (left) skew has a long left tail (more dangerous for investors) and pulls the mean below the median. For a perfectly symmetric distribution, mean = median = mode. Kurtosis describes tail thickness; the normal distribution has kurtosis of 3, so excess kurtosis is kurtosis minus 3. A leptokurtic distribution (positive excess kurtosis) has fatter tails and a sharper peak, producing more extreme outcomes than the normal model predicts.
Most equity-return series are negatively skewed and leptokurtic, which is why pure normal probabilities understate crash risk.
The normal distribution and z-scores
The normal distribution is symmetric, fully described by its mean and variance, has skewness of zero and excess kurtosis of zero, and is the basis of mean-variance analysis. A z-score standardizes any observation: z = (x - mean)/standard deviation. Memorize these landmarks:
- About 68% of observations lie within +/- 1 standard deviation, 95% within +/- 2, and 99.7% within +/- 3.
- One-tailed critical z-values: 1.65 for 5%, 2.33 for 1%.
- Two-tailed critical z-values: 1.96 for 5% (so 90% interval uses 1.65, 95% uses 1.96, 99% uses 2.58).
A return of 14% drawn from a normal distribution with mean 8% and standard deviation 3% has z = (14 - 8)/3 = 2.0, placing it two standard deviations above the mean.
| Concept | Formula or meaning | Candidate use |
|---|---|---|
| Arithmetic mean | sum x / n | Average return |
| Median | Middle sorted value | Outlier-robust center |
| Sample variance | sum(x - xbar)^2 / (n-1) | Total variability |
| Standard deviation | sqrt(variance) | Risk in return units |
| Coefficient of variation | s / mean | Risk per unit of return |
| Skewness | Direction of long tail | Asymmetry of outcomes |
| Excess kurtosis | Kurtosis minus 3 | Fat-tail / extreme-loss risk |
| z-score | (x - mean)/s | Standardized position |
Downside risk measures
Standard deviation penalizes upside and downside equally, but investors fear losses. Semivariance uses only returns below the mean; target semideviation measures dispersion below a stated target. Shortfall risk is the probability that return falls below a threshold. Value at risk (VaR) estimates a minimum loss for a stated probability and horizon (for example, a 5% one-day VaR of 1 million means losses should exceed 1 million only 5% of days).
Match the measure to the decision: a pension with a required return cares about shortfall probability, while an option book with rare large losses needs skewness and kurtosis, not just standard deviation.
Worked dispersion example
Consider four annual returns: 4%, 8%, 12%, and 16%. The arithmetic mean is (4 + 8 + 12 + 16)/4 = 10%. The deviations from the mean are -6, -2, +2, and +6, whose squares sum to 36 + 4 + 4 + 36 = 80. As a sample, divide by n - 1 = 3 to get a variance of 80/3 = 26.67, so the sample standard deviation is sqrt(26.67) = 5.16%. As a population, divide by N = 4 to get a variance of 20 and a standard deviation of 4.47%. The coefficient of variation for the sample case is 5.16/10 = 0.52, meaning roughly half a unit of return risk per unit of expected return.
This single example shows why reading sample versus population in the stem is not optional: the same data yield two different standard deviations.
Exam tactics
On the exam, if the mean exceeds the median, think positive skew; if the tails are fat (positive excess kurtosis), normal probabilities understate extreme events such as crashes; and between two funds with equal means, the one with the lower standard deviation has lower total volatility. When a question gives a target return and asks for the chance of underperforming, compute a z-score relative to that target and read the normal table; when it asks which of two assets is riskier per unit of return, compare coefficients of variation rather than raw standard deviations, because scale differs across the two return series.
A return distribution has a mean greater than its median. The distribution is best described as:
A sample of five annual returns has a sample mean of 6%. When calculating the sample variance, the sum of squared deviations is divided by:
Returns are normally distributed with a mean of 8% and standard deviation of 3%. The probability of a return below 2% is closest to: