Statistical Estimation and Hypothesis Testing
Key Takeaways
- Point estimates (sample mean x̄, sample standard deviation s) approximate population parameters (μ, σ).
- Confidence intervals provide a range likely to contain the population parameter at a given confidence level.
- The 95% confidence interval for a population mean: x̄ ± z(α/2) · σ/√n (known σ) or x̄ ± t(α/2) · s/√n (unknown σ).
- Hypothesis testing compares a test statistic to a critical value to accept or reject a null hypothesis H₀.
- Type I error (α) = rejecting H₀ when it is true; Type II error (β) = failing to reject H₀ when it is false.
- The t-distribution is used instead of z when σ is unknown and the sample size is small (n < 30).
Statistical Estimation and Hypothesis Testing
Descriptive Statistics Review
Measures of Central Tendency
| Measure | Formula | Use |
|---|---|---|
| Mean (x̄) | Σxᵢ/n | Average value; sensitive to outliers |
| Median | Middle value when sorted | Robust to outliers |
| Mode | Most frequent value | Categorical data |
Measures of Dispersion
| Measure | Formula | Notes |
|---|---|---|
| Range | max - min | Simplest spread measure |
| Variance (s²) | Σ(xᵢ - x̄)²/(n-1) | Average squared deviation (sample) |
| Standard Deviation (s) | √(s²) | Same units as data |
| Coefficient of Variation | (s/x̄) × 100% | Relative variability |
Note: For sample statistics, divide by (n-1), not n. This is called Bessel's correction and produces an unbiased estimate of population variance.
Point Estimation
A point estimate is a single value used to estimate a population parameter:
| Population Parameter | Point Estimate |
|---|---|
| Population mean μ | Sample mean x̄ |
| Population variance σ² | Sample variance s² |
| Population proportion p | Sample proportion p̂ = x/n |
Confidence Intervals
A confidence interval provides a range of plausible values for a population parameter.
For Population Mean (σ known):
For Population Mean (σ unknown, use t):
Common z-values:
| Confidence Level | z(α/2) |
|---|---|
| 90% | 1.645 |
| 95% | 1.960 |
| 99% | 2.576 |
Example: A sample of 25 concrete cylinders has x̄ = 4,500 psi and s = 300 psi. Find the 95% confidence interval for the population mean.
With n = 25, df = 24, t₀.₀₂₅ ≈ 2.064: CI = 4,500 ± 2.064 × (300/√25) = 4,500 ± 123.8 = (4,376.2, 4,623.8) psi
Hypothesis Testing
Steps:
- State hypotheses: H₀ (null) and H₁ (alternative)
- Choose significance level α (commonly 0.05)
- Calculate test statistic
- Compare to critical value or compute p-value
- Make decision: Reject H₀ if |test statistic| > critical value
Test Statistic for Mean:
Types of Errors
| H₀ is True | H₀ is False | |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct Decision (Power = 1-β) |
| Fail to Reject H₀ | Correct Decision | Type II Error (β) |
- Type I error (α): False positive — rejecting a true null hypothesis
- Type II error (β): False negative — failing to reject a false null hypothesis
- Power (1-β): Probability of correctly rejecting a false null hypothesis
Regression and Correlation
Linear Regression
The least-squares line: ŷ = b₀ + b₁x
Coefficient of Determination (R²)
- R² ranges from 0 to 1
- R² = 1 means perfect fit (all variation explained by the model)
- R² = 0 means the model explains none of the variation
- R² = 0.85 means 85% of the variation in y is explained by x
Correlation Coefficient (r)
- r ranges from -1 to +1
- r = +1: perfect positive linear relationship
- r = -1: perfect negative linear relationship
- r = 0: no linear relationship
- r = ±√(R²) — sign indicates direction
A sample of 36 measurements has a mean of 82 and a population standard deviation of 12. What is the 95% confidence interval for the population mean?
In hypothesis testing, a Type I error occurs when you:
If R² = 0.92 for a linear regression model, what does this mean?