3.5 Sampling, Estimation, and Hypothesis Testing

Key Takeaways

  • A statistic describes a sample while a parameter describes the population; statistics carry sampling error.
  • The central limit theorem makes the sample-mean distribution approximately normal for large samples, justifying z and t tools.
  • A confidence interval is point estimate plus or minus a reliability factor times the standard error.
  • Hypothesis tests weigh evidence against a null using a test statistic, a rejection region, or a p-value, balancing Type I and Type II error.
Last updated: June 2026

Sampling, Estimation, and Hypothesis Testing

Analysts rarely observe a full population; they work with samples of returns, fund performance, defaults, or survey responses. Inference uses sample evidence to make statements about a population while acknowledging sampling error.

Parameters, statistics, and sampling

A parameter is a numerical feature of a population, such as a strategy's true mean return. A statistic is a feature of a sample, such as the average return over 60 months. Statistics vary from sample to sample, so every estimate should be paired with an uncertainty measure. Simple random sampling gives each population member an equal selection chance; stratified random sampling draws within defined groups (for example, by sector or rating) to improve representation and often lower the standard error. Time-series financial data demand care because observations can be serially correlated.

Two biases recur on the exam: sampling bias, where the sample is not representative, and survivorship bias, where failed funds drop out of databases and overstate average performance. Look-ahead bias uses information not yet available at the decision date.

Standard error and the central limit theorem

The sampling distribution of the sample mean is the distribution of the mean across repeated samples. Its standard deviation is the standard error: s/sqrt(n) when the population standard deviation is unknown and the sample standard deviation s is used (or sigma/sqrt(n) when sigma is known). Larger samples shrink the standard error. The central limit theorem (CLT) states that for a sufficiently large sample (commonly n >= 30), the sampling distribution of the mean is approximately normal even when the population is nonnormal, provided observations are independent and identically distributed with finite variance.

The CLT is why normal and t tools dominate inference.

Confidence intervals and choosing a statistic

A confidence interval has three parts: point estimate +/- (reliability factor x standard error). For a normal sampling distribution, the reliability factors are 1.65 (90%), 1.96 (95%), and 2.58 (99%). Intervals widen with higher confidence, greater variability, or smaller samples. Use the z-statistic when the population variance is known or the sample is large; use the t-statistic (with n - 1 degrees of freedom) when the population variance is unknown and you rely on the sample standard deviation. The t distribution has fatter tails and converges to the normal as degrees of freedom rise.

Hypothesis testing

Testing begins with a null hypothesis H0 (the statement tested directly, always containing the equality) and an alternative Ha. A two-tailed test looks for a difference in either direction; a one-tailed test looks for a directional effect. A test statistic compares the sample result with the hypothesized value in standard-error units: for a mean with unknown variance, t = (xbar - mu0)/(s/sqrt(n)). Reject H0 if the statistic falls in the rejection region or if the p-value, the smallest significance level at which H0 can be rejected, is less than alpha.

Inference toolCore useLevel I application
Standard errors/sqrt(n)Precision of the sample mean
z-testKnown variance or large nTest of a mean or proportion
t-testUnknown variance, sample sTest of a mean
Chi-square testCounts or a single varianceIndependence; variance test
F-testRatio of two variancesCompare variances
p-valueSmallest alpha to rejectStrength of the evidence

Errors, power, and tests of independence

A Type I error rejects a true null; its probability is the significance level alpha. A Type II error fails to reject a false null; its probability is beta. Power is 1 - beta, the probability of correctly rejecting a false null. Holding sample size fixed, lowering alpha to reduce Type I risk raises Type II risk; only a larger sample reduces both. Tests of independence use a chi-square statistic on a contingency table, comparing observed counts with expected counts built from row, column, and grand totals; large gaps support rejection.

Exam stems usually reveal the right test: a mean with unknown variance signals t; a categorical table signals chi-square; and a question about statistical significance asks you to compare the p-value with alpha or the statistic with the critical value.

Worked confidence-interval example

A fund's 36 monthly returns average 1.0% with a sample standard deviation of 3.0%. The standard error is 3.0%/sqrt(36) = 0.5%. Because the sample is large (n >= 30), use the z reliability factor of 1.96 for a 95% interval: 1.0% +/- 1.96(0.5%) = 1.0% +/- 0.98%, or roughly 0.02% to 1.98%. Because the interval excludes zero, you could reject at the 5% level a null hypothesis that the true mean monthly return is zero. To run that test directly, compute z = (1.0% - 0)/0.5% = 2.0, which exceeds the two-tailed critical value of 1.96, so you reject the null and conclude the mean differs from zero.

If the population standard deviation had been unknown and the sample small, you would instead use a t reliability factor with 35 degrees of freedom, which is slightly wider than 1.96.

Newer inference tools

The 2026 curriculum also references resampling methods. The bootstrap repeatedly draws samples (with replacement) from the observed data to approximate the sampling distribution of a statistic when an analytic standard error is hard to derive; jackknife systematically leaves out one observation at a time. These data-driven techniques complement the central limit theorem when distributional assumptions are shaky.

Keep the exam decision rule simple: identify the parameter being tested, choose z or t by what is known about the variance and the sample size, set up one- or two-tailed based on the alternative, and reject only when the evidence clears the chosen significance level.

Test Your Knowledge

A sample has a standard deviation of 18% and 36 observations. The standard error of the sample mean is:

A
B
C
D
Test Your Knowledge

Rejecting a null hypothesis that is actually true is best described as a:

A
B
C
D
Test Your Knowledge

A researcher tests whether sector classification and credit-rating category are independent using a contingency table of counts. The most appropriate test is a:

A
B
C
D