2.2 Probability & Statistics
Key Takeaways
- Central tendency: mean (arithmetic average), median (middle value), mode (most frequent); dispersion: variance σ² and standard deviation σ = √variance.
- Addition rule P(A∪B)=P(A)+P(B)−P(A∩B); multiplication for independent events P(A∩B)=P(A)P(B); conditional P(A|B)=P(A∩B)/P(B).
- The standard normal z-score z=(x−μ)/σ converts any value to standard-normal units so you can read probabilities from the Handbook's z-table.
- A confidence interval for the mean is x̄ ± z(σ/√n) (or t when σ is unknown and n is small); linear regression fits y = a + bx by least squares.
Describing Data: Central Tendency & Dispersion
Probability & Statistics is 4–6 questions on the FE Civil exam and supports quality-control, surveying-error, and risk problems. Start with the descriptors:
- Mean (arithmetic average): x̄ = (Σxᵢ)/n — sensitive to outliers.
- Median: the middle value when data are ordered — robust to outliers.
- Mode: the most frequently occurring value.
Dispersion measures how spread out the data are. The sample variance is s² = Σ(xᵢ − x̄)²/(n−1) and the standard deviation s = √s². (A population uses N rather than n−1.) The trap: dividing by n instead of n−1 for a sample, and forgetting that σ has the same units as the data while σ² has squared units. The coefficient of variation CV = σ/μ expresses relative scatter.
Two more descriptors occasionally appear. The range is simply max − min, the crudest spread measure. The weighted mean x̄_w = (Σwᵢxᵢ)/(Σwᵢ) is used when observations carry unequal weights — common in surveying, where a measurement's weight is inversely proportional to its variance. When a problem gives you grouped or frequency data, the mean is Σ(fᵢxᵢ)/Σfᵢ, not a simple average of the class values.
Probability Rules
Probabilities run 0 to 1. The Handbook lists the core rules:
| Rule | Formula | Use |
|---|---|---|
| Addition (general) | P(A∪B)=P(A)+P(B)−P(A∩B) | 'A or B' |
| Addition (mutually exclusive) | P(A∪B)=P(A)+P(B) | disjoint events |
| Multiplication (independent) | P(A∩B)=P(A)·P(B) | 'A and B' |
| Conditional | P(A|B)=P(A∩B)/P(B) | 'A given B' |
| Complement | P(A')=1−P(A) | 'not A' |
The most common error is subtracting the overlap P(A∩B) when events are already mutually exclusive (overlap = 0), or failing to subtract it when they can occur together.
Independence vs. mutual exclusivity is a classic trap: two events are independent when one's occurrence does not change the other's probability (P(A|B) = P(A)); they are mutually exclusive when they cannot both happen (P(A∩B) = 0). These are different concepts — mutually exclusive events with non-zero probabilities are necessarily dependent. Counting underpins many problems: permutations (order matters) P(n,r) = n!/(n−r)!, and combinations (order does not) C(n,r) = n!/[r!(n−r)!]. Bayes' theorem, P(A|B) = P(B|A)P(A)/P(B), reverses a conditional and is listed in the Handbook for diagnostic-type questions.
Distributions: Normal, Binomial & t
The normal (Gaussian) distribution is symmetric and bell-shaped; about 68% of values fall within ±1σ, 95% within ±2σ, 99.7% within ±3σ. Any value is standardized with the z-score z = (x − μ)/σ, then probabilities are read from the Handbook's unit-normal table.
The binomial distribution gives P(x successes in n trials) = C(n,x)·pˣ·(1−p)ⁿ⁻ˣ, with mean np and variance np(1−p) — used for go/no-go inspection. The Student's t distribution replaces the normal when the population σ is unknown and the sample is small (n < ~30); it is wider and depends on degrees of freedom (df = n − 1).
Two discrete-event distributions also appear. The Poisson distribution, P(x) = (λˣe⁻λ)/x!, models the number of events in a fixed interval (e.g., vehicle arrivals per minute at an intersection, or flaws per kilometer of weld), where λ is the mean rate; its variance equals its mean. The uniform distribution spreads probability evenly over an interval. The key modeling decision is binomial vs. Poisson: binomial counts successes in a fixed number of trials, while Poisson counts occurrences over a continuum of time or space with no fixed trial count.
Worked Example — Normal Distribution z-Score
Concrete cylinder strengths are normally distributed with mean μ = 4000 psi and standard deviation σ = 300 psi. What fraction fall below the 3500-psi spec?
Step 1 — standardize: z = (x − μ)/σ = (3500 − 4000)/300 = −500/300 = −1.67.
Step 2 — read the table: the area to the left of z = −1.67 is about 0.0475.
Result: roughly 4.75% of cylinders fall below 3500 psi. If you instead wanted the fraction between 3500 and 4500 psi, compute z at both ends (−1.67 and +1.67) and subtract the tail areas. Always sketch the bell curve and shade the region you need — sign of z and which tail is the #1 source of errors.
Confidence Intervals, Regression & Correlation
A confidence interval brackets the true mean. With σ known: x̄ ± z·(σ/√n); for 95% confidence z = 1.96. When σ is unknown and n is small, use t in place of z: x̄ ± t·(s/√n). Widening confidence (90%→99%) widens the interval; larger n narrows it via the √n in the denominator.
Linear regression fits the least-squares line ŷ = a + bx, choosing slope b and intercept a to minimize Σ(yᵢ − ŷᵢ)². The correlation coefficient r (−1 ≤ r ≤ +1) measures linear-fit strength: r = +1 perfect positive, 0 none, −1 perfect negative. The coefficient of determination r² is the fraction of variance explained. Distinguish correlation (association) from causation — a frequent conceptual trap.
The least-squares slope and intercept have closed forms in the Handbook: b = [nΣxy − (Σx)(Σy)]/[nΣx² − (Σx)²], and a = ȳ − b·x̄, so the regression line always passes through the point (x̄, ȳ). On the exam you may be asked to predict ŷ at a given x once a and b are supplied, or to interpret r — for instance r = 0.95 indicates a strong positive linear relationship and r² = 0.90 means 90% of the variation in y is explained by the fitted line. Remember that a high r² does not validate extrapolation beyond the data range, another common conceptual pitfall.
Beam deflections (mm) measured at 5 points are 12, 15, 11, 14, 13. What is the sample standard deviation (use n−1)?
Soil samples have compressive strength μ = 200 kPa, σ = 25 kPa, normally distributed. What is the z-score for a 250-kPa sample?
A defect occurs independently on 10% of welds. What is the probability that two specific welds are BOTH defective?
Which distribution best models the number of vehicles arriving at an intersection during a fixed one-minute interval, given a known average arrival rate?