6.2 Sampling, Bias, Correlation, and Inference
Key Takeaways
- A random sample protects against systematic selection bias; a large biased sample can still support a poor conclusion.
- Correlation measures association between paired variables, but causation requires stronger design, control, or experimental evidence.
- Inference questions ask whether a claim about a population is justified by the way the data were collected.
- Teacher-certification reasoning often centers on explaining why a study design is flawed, not on performing advanced hypothesis tests.
Sampling Is Part of the Mathematics
AEPA statistics questions can feel qualitative, but the reasoning is mathematical. A statistic is useful only if the data collection method supports the claim being made. The Domain V profile explicitly includes bias and sampling techniques, so expect items that ask which survey is most reliable, which conclusion is justified, or which student statement misuses correlation. You do not need a full college statistics course to answer these, but you do need precise language.
A population is the entire group of interest. A sample is the subset actually observed. A parameter describes the population, such as the true mean time Arizona high school students spend on homework. A statistic describes the sample, such as the mean homework time from 120 surveyed students. Inference is the process of using a statistic to estimate or reason about a parameter. The weak link is often not the formula; it is whether the sample represents the population.
Sampling Methods and Bias
| Method | How It Works | Strength | Common Risk |
|---|---|---|---|
| Simple random sample | Every member has an equal chance | Easiest to justify mathematically | Requires a complete list |
| Stratified sample | Randomly sample within important subgroups | Protects representation across groups | Strata must be chosen honestly |
| Cluster sample | Randomly choose intact groups | Efficient for geography or classrooms | Clusters may differ from each other |
| Systematic sample | Choose every kth member after a random start | Simple to implement | Periodic patterns can bias results |
| Convenience sample | Use whoever is easy to reach | Fast | Usually weak for broad claims |
Bias is systematic error. It is not fixed by merely adding more observations. If a candidate surveys only students attending an optional after-school math club, a larger sample gives a more precise picture of that club, not of all students. Voluntary response bias occurs when people choose whether to respond, often because they feel strongly. Undercoverage occurs when part of the population is left out. Nonresponse occurs when selected individuals do not participate. Question wording bias occurs when the prompt nudges an answer.
Correlation, Causation, and Lurking Variables
Correlation describes the direction and strength of an association between two quantitative variables. A positive association means larger values of one variable tend to occur with larger values of the other. A negative association means larger values of one tend to occur with smaller values of the other. A correlation near zero means there is little linear association, though a curved relationship might still exist.
Correlation does not prove causation. Suppose a scatterplot shows that schools with more advanced math electives also have higher average test scores. It might be tempting to say the electives caused the scores. A more careful explanation notes possible lurking variables: school size, student course history, funding, scheduling, prior achievement, or selection into electives. A randomized experiment would be stronger evidence, but many education contexts rely on observational data, so the conclusion should be limited to association.
Worked Example: What Conclusion Is Justified?
A district wants to estimate the percentage of families who support a new statistics course. It emails a survey to all families and receives replies from 8% of them. Of those replies, 74% support the course. The tempting conclusion is "74% of families support the course." The defensible conclusion is narrower: among families who responded, 74% support it. Because the response rate is low and voluntary, the results may overrepresent families with strong opinions. The issue is nonresponse and voluntary response bias, not arithmetic.
Now compare a stratified design. If the district randomly samples families within each high school attendance area in proportion to enrollment, follows up with nonrespondents, and reports a margin of uncertainty, the conclusion is much stronger. Even without computing a formal confidence interval, you can identify why representation improved: important subgroups had planned inclusion, and the selection process was random inside each subgroup.
Inference Language for Teacher Candidates
Teacher-certification items often ask how a teacher should respond to a student claim. If a student says, "The sample mean is 68, so the population mean is exactly 68," the misconception is treating an estimate as exact. A better response is, "The sample mean estimates the population mean, and random sampling helps make the estimate trustworthy, but different random samples can produce different means." If a student says, "The graph slopes upward, so one variable causes the other," the misconception is confusing association with cause.
Practical Checklist
- Define the target population before judging the sample.
- Ask whether selection was random, representative, and sufficiently inclusive.
- Identify bias by naming the mechanism, not just saying "the sample is bad."
- For paired numerical data, describe trend, form, strength, and outliers.
- Limit conclusions to what the design can support.
This checklist keeps AEPA answers grounded. The test is not asking for vague skepticism. It is asking whether you can connect a mathematical claim to the evidence that produced it, which is exactly the reasoning a secondary mathematics teacher must model when students interpret studies, graphs, and survey results.
A school surveys only students who attend an optional math tutoring session and uses the results to describe all students in the school. What is the main flaw?
A scatterplot shows a strong positive association between hours spent practicing and contest scores. Which conclusion is most appropriate from the scatterplot alone?
Which survey design best supports an inference about all teachers in a district?