9.5 Statistics Quick-Reference and Practice Drills
Key Takeaways
- In a normal distribution, about 68% of scores fall within one standard deviation of the mean, 95% within two, and 99.7% within three.
- Mean, median, and mode coincide in a symmetric distribution; skew pulls the mean toward the tail.
- Correlation never proves causation; a third variable or reverse causation may explain an association.
- Reliability is consistency of measurement; validity is whether an instrument measures what it claims—an instrument can be reliable without being valid.
9.5 Statistics Quick-Reference and Practice Drills
The final cluster of Research items rewards quick, accurate recall of descriptive statistics, the normal curve, and the reliability/validity distinction. Build a one-page sheet from the tables below and drill it until each fact is automatic.
The normal distribution
Many CPCE items reference the normal (bell) curve, which is symmetric with mean = median = mode at the center. The empirical (68-95-99.7) rule is high-yield:
| Within ±SD | Percent of scores | Approximate use |
|---|---|---|
| ±1 standard deviation | ~68% | Common range of typical scores |
| ±2 standard deviations | ~95% | Boundary for many cut scores |
| ±3 standard deviations | ~99.7% | Nearly all observations |
A z-score expresses how many standard deviations a raw score lies from the mean (z = (X − M) / SD). A z of +1.0 sits at about the 84th percentile; a z of 0 is exactly the mean and the 50th percentile.
Central tendency and skew
- Mean — arithmetic average; sensitive to outliers.
- Median — middle value; resistant to outliers; best for skewed data or ordinal scales.
- Mode — most frequent value; the only option for nominal data.
In a positively (right) skewed distribution, the long tail of high scores pulls the mean above the median. In a negatively (left) skewed distribution, the mean falls below the median. Remember: the mean chases the tail. So when a stem reports a few extreme high incomes among clients, the median better represents the typical client.
Reliability vs. validity
| Concept | Question it answers | Examples |
|---|---|---|
| Reliability | Is the measure consistent? | Test-retest, internal consistency (Cronbach's alpha), inter-rater |
| Validity | Does it measure what it claims? | Content, criterion (concurrent/predictive), construct |
The key relationship: an instrument can be reliable without being valid (a miscalibrated scale that is consistently five pounds off), but it cannot be valid without being reliable. Reliability is necessary but not sufficient for validity.
Correlation cautions
Correlation coefficients range from −1.0 to +1.0; the sign shows direction and the absolute value shows strength. A coefficient near 0 means little linear association. The cardinal rule: correlation does not equal causation, because a third (confounding) variable or reverse causation may drive the link. Squaring r gives the coefficient of determination (r²), the proportion of variance shared—an r of .60 means about 36% of variance is shared.
Drill protocol
Run mixed sets, not topic-blocked sets, so you must first identify what kind of item it is:
- Design ID drill — read a vignette and name the design and what it can claim (cause, association, description).
- Statistic-match drill — given a question and scale of measurement, name the correct test.
- Validity-threat drill — spot the threat to internal or external validity in a one-line scenario.
- Curve drill — convert between z-scores, percentiles, and the 68-95-99.7 bands.
- Ethics drill — flag the missing consent, IRB, or honesty step in a research vignette.
Standard scores you should recognize
Beyond z-scores, the CPCE references several standard score systems built on the normal curve, because counselors interpret test results constantly. T-scores have a mean of 50 and SD of 10 (used on many personality inventories such as the MMPI). Standard scores on cognitive tests often use a mean of 100 and SD of 15. Stanines divide the distribution into nine bands (mean 5, SD ~2). Percentile ranks report the percentage of the norm group scoring at or below a value and are not equal-interval — the gap between the 50th and 55th percentile is far smaller in raw points than the gap between the 90th and 95th.
A common item gives a client's z-score or T-score and asks for the approximate percentile; anchoring on z = 0 (50th), z = +1 (84th), and z = −1 (16th) handles most of these quickly.
Types of reliability and validity, drilled
It pays to distinguish the sub-types because items name them specifically. For reliability: test-retest (stability over time), internal consistency (items measure the same construct, indexed by Cronbach's alpha), alternate-forms (two equivalent versions agree), and inter-rater (two scorers agree). For validity: content (items cover the domain), criterion-related — split into concurrent (correlates with a current measure) and predictive (forecasts a future outcome) — and construct (measures the abstract trait, supported by convergent and discriminant evidence).
A vignette describing whether an admissions test forecasts later GPA is testing predictive validity; one asking whether a depression scale's items hang together is testing internal consistency reliability.
Readiness markers
You are ready when you can, after a one-day break, look at an unlabeled stem and immediately route it: is this asking about design, statistic choice, validity, the normal curve, EBP, evaluation, or ethics? If you can name the concept and the single decision it controls, the domain is exam-ready. If you can only recognize vocabulary but freeze on application, return to the drills until routing is automatic. A practical benchmark: on a 20-item mixed set drawn from all five sections, aim for at least 80% correct with a one-sentence rationale for each answer and a one-sentence reason each distractor fails.
If your accuracy holds but your rationales are vague, you are recognizing patterns rather than understanding them—rebuild the weak concept from its definition before moving on.
On a normally distributed assessment with a mean of 100 and a standard deviation of 15, approximately what percentage of scores fall between 85 and 115?
A bathroom scale consistently reads five pounds heavier than a person's true weight every time. This instrument is: