2.2 Probability & Statistics

Key Takeaways

  • Central tendency: mean (arithmetic average), median (middle value), mode (most frequent); dispersion: variance σ² and standard deviation σ = √variance.
  • Addition rule P(A∪B)=P(A)+P(B)−P(A∩B); multiplication for independent events P(A∩B)=P(A)P(B); conditional P(A|B)=P(A∩B)/P(B).
  • The standard normal z-score z=(x−μ)/σ converts any value to standard-normal units so you can read probabilities from the Handbook's z-table.
  • A confidence interval for the mean is x̄ ± z(σ/√n) (or t when σ is unknown and n is small); linear regression fits y = a + bx by least squares.
Last updated: June 2026

Describing Data: Central Tendency & Dispersion

Probability & Statistics is 4–6 questions on the FE Civil exam and supports quality-control, surveying-error, and risk problems. Start with the descriptors:

  • Mean (arithmetic average): x̄ = (Σxᵢ)/n — sensitive to outliers.
  • Median: the middle value when data are ordered — robust to outliers.
  • Mode: the most frequently occurring value.

Dispersion measures how spread out the data are. The sample variance is s² = Σ(xᵢ − x̄)²/(n−1) and the standard deviation s = √s². (A population uses N rather than n−1.) The trap: dividing by n instead of n−1 for a sample, and forgetting that σ has the same units as the data while σ² has squared units. The coefficient of variation CV = σ/μ expresses relative scatter.

Two more descriptors occasionally appear. The range is simply max − min, the crudest spread measure. The weighted mean x̄_w = (Σwᵢxᵢ)/(Σwᵢ) is used when observations carry unequal weights — common in surveying, where a measurement's weight is inversely proportional to its variance. When a problem gives you grouped or frequency data, the mean is Σ(fᵢxᵢ)/Σfᵢ, not a simple average of the class values.

Probability Rules

Probabilities run 0 to 1. The Handbook lists the core rules:

RuleFormulaUse
Addition (general)P(A∪B)=P(A)+P(B)−P(A∩B)'A or B'
Addition (mutually exclusive)P(A∪B)=P(A)+P(B)disjoint events
Multiplication (independent)P(A∩B)=P(A)·P(B)'A and B'
ConditionalP(A|B)=P(A∩B)/P(B)'A given B'
ComplementP(A')=1−P(A)'not A'

The most common error is subtracting the overlap P(A∩B) when events are already mutually exclusive (overlap = 0), or failing to subtract it when they can occur together.

Independence vs. mutual exclusivity is a classic trap: two events are independent when one's occurrence does not change the other's probability (P(A|B) = P(A)); they are mutually exclusive when they cannot both happen (P(A∩B) = 0). These are different concepts — mutually exclusive events with non-zero probabilities are necessarily dependent. Counting underpins many problems: permutations (order matters) P(n,r) = n!/(n−r)!, and combinations (order does not) C(n,r) = n!/[r!(n−r)!]. Bayes' theorem, P(A|B) = P(B|A)P(A)/P(B), reverses a conditional and is listed in the Handbook for diagnostic-type questions.

Distributions: Normal, Binomial & t

The normal (Gaussian) distribution is symmetric and bell-shaped; about 68% of values fall within ±1σ, 95% within ±2σ, 99.7% within ±3σ. Any value is standardized with the z-score z = (x − μ)/σ, then probabilities are read from the Handbook's unit-normal table.

The binomial distribution gives P(x successes in n trials) = C(n,x)·pˣ·(1−p)ⁿ⁻ˣ, with mean np and variance np(1−p) — used for go/no-go inspection. The Student's t distribution replaces the normal when the population σ is unknown and the sample is small (n < ~30); it is wider and depends on degrees of freedom (df = n − 1).

Two discrete-event distributions also appear. The Poisson distribution, P(x) = (λˣe⁻λ)/x!, models the number of events in a fixed interval (e.g., vehicle arrivals per minute at an intersection, or flaws per kilometer of weld), where λ is the mean rate; its variance equals its mean. The uniform distribution spreads probability evenly over an interval. The key modeling decision is binomial vs. Poisson: binomial counts successes in a fixed number of trials, while Poisson counts occurrences over a continuum of time or space with no fixed trial count.

Worked Example — Normal Distribution z-Score

Concrete cylinder strengths are normally distributed with mean μ = 4000 psi and standard deviation σ = 300 psi. What fraction fall below the 3500-psi spec?

Step 1 — standardize: z = (x − μ)/σ = (3500 − 4000)/300 = −500/300 = −1.67.

Step 2 — read the table: the area to the left of z = −1.67 is about 0.0475.

Result: roughly 4.75% of cylinders fall below 3500 psi. If you instead wanted the fraction between 3500 and 4500 psi, compute z at both ends (−1.67 and +1.67) and subtract the tail areas. Always sketch the bell curve and shade the region you need — sign of z and which tail is the #1 source of errors.

Confidence Intervals, Regression & Correlation

A confidence interval brackets the true mean. With σ known: x̄ ± z·(σ/√n); for 95% confidence z = 1.96. When σ is unknown and n is small, use t in place of z: x̄ ± t·(s/√n). Widening confidence (90%→99%) widens the interval; larger n narrows it via the √n in the denominator.

Linear regression fits the least-squares line ŷ = a + bx, choosing slope b and intercept a to minimize Σ(yᵢ − ŷᵢ)². The correlation coefficient r (−1 ≤ r ≤ +1) measures linear-fit strength: r = +1 perfect positive, 0 none, −1 perfect negative. The coefficient of determination r² is the fraction of variance explained. Distinguish correlation (association) from causation — a frequent conceptual trap.

The least-squares slope and intercept have closed forms in the Handbook: b = [nΣxy − (Σx)(Σy)]/[nΣx² − (Σx)²], and a = ȳ − b·x̄, so the regression line always passes through the point (x̄, ȳ). On the exam you may be asked to predict ŷ at a given x once a and b are supplied, or to interpret r — for instance r = 0.95 indicates a strong positive linear relationship and r² = 0.90 means 90% of the variation in y is explained by the fitted line. Remember that a high r² does not validate extrapolation beyond the data range, another common conceptual pitfall.

Test Your Knowledge

Beam deflections (mm) measured at 5 points are 12, 15, 11, 14, 13. What is the sample standard deviation (use n−1)?

A
B
C
D
Test Your Knowledge

Soil samples have compressive strength μ = 200 kPa, σ = 25 kPa, normally distributed. What is the z-score for a 250-kPa sample?

A
B
C
D
Test Your Knowledge

A defect occurs independently on 10% of welds. What is the probability that two specific welds are BOTH defective?

A
B
C
D
Test Your Knowledge

Which distribution best models the number of vehicles arriving at an intersection during a fixed one-minute interval, given a known average arrival rate?

A
B
C
D