9.5 Probability, Statistics, and the Normal Distribution
Key Takeaways
- Probability and statistics help distinguish expected measurement scatter from suspicious observations.
- Mean, standard deviation, variance, and confidence concepts are practical tools for repeated measurements.
- Normal-distribution reasoning is useful for random error, but it should not be used to excuse systematic error or blunders.
- Sampling and independence assumptions matter when interpreting repeated observations or quality-control data.
Statistics for Measurement Decisions
The FS exam includes probability and statistics because surveyors must make decisions from imperfect measurements. Repeated observations of a distance, angle, or elevation difference will not usually match exactly. Statistics provides the language for describing that scatter and deciding whether the data behave as expected. The goal is not to turn every survey problem into a research project; it is to recognize what a statistic means in a measurement context.
The mean is a measure of central tendency. The standard deviation describes typical spread around that mean, while variance is the square of the standard deviation. A small standard deviation indicates tighter repeatability under similar conditions. It does not prove that the result is accurate, because a systematic error can shift all observations in the same direction while preserving a small spread.
Statistical Concepts to Know
| Concept | Meaning | Surveying caution |
|---|---|---|
| Mean | Average of observations | Can be biased by systematic error |
| Median | Middle value when ordered | Useful when outliers are present |
| Standard deviation | Typical scatter | Reflects precision, not automatically accuracy |
| Variance | Squared standard deviation | Used in weighting and propagation |
| Confidence interval | Range tied to probability assumptions | Depends on model and sample information |
| Outlier | Observation inconsistent with the data pattern | Investigate before deleting |
The normal distribution is often used to model random error because many small independent influences can combine into bell-shaped scatter. Under common normal assumptions, observations near the mean are more likely than observations far away. This helps explain why a large residual is worth investigating. However, a blunder, wrong prism constant, misread rod, bad setup, or incorrect unit conversion is not simply a normal random event.
Probability questions may involve complements, independent events, or simple counting. In an FS setting, probability can also describe quality control. For example, if independent checks have known failure probabilities, the chance of at least one failure can be found using the complement of no failures. Independence matters. Two measurements made with the same miscalibrated equipment are not independent evidence of accuracy.
Regression and correlation may appear where data trends are involved. A regression line can estimate a relationship, while correlation describes association. Correlation does not prove causation, and a fitted line should not be trusted far outside the observed data range without justification. In surveying, regression can support calibration, deformation monitoring, or quality review, but the physical interpretation still matters.
When answering statistics questions, separate precision from accuracy. Precision is repeatability; accuracy is closeness to the true or accepted value. A tight cluster far from truth is precise but inaccurate. A wide cluster centered on truth may be unbiased but imprecise. This distinction is central to measurement science and shows up in many FS-style distractors.
Use statistics to support judgment. Compute the requested value, then ask what it means for the field work. A standard deviation, confidence interval, or residual is not just a number; it is evidence about whether the measurements and model are behaving as expected.
A repeated distance measurement has very small scatter but all readings are affected by an uncorrected prism constant. Which statement is best?
Why is a large residual in an adjusted network important?
Which statistic is the square of standard deviation?