2.3 Data Analysis and Graphing

Key Takeaways

  • Mean is the arithmetic average; median is the middle value; mode is the most frequent value; range is the spread (max minus min); standard deviation quantifies how tightly data cluster around the mean.
  • A p-value below 0.05 indicates that the observed result would occur by chance less than 5% of the time if the null hypothesis were true — by convention, this threshold is called 'statistically significant.'
  • A t-test compares two group means for continuous data; a chi-square goodness-of-fit test compares observed counts to expected counts in categorical data.
  • Correlation (Pearson r between -1 and +1) measures the strength and direction of a linear relationship but does NOT establish causation.
  • Bar graphs compare categorical groups, line graphs show change over a continuous variable (time, temperature, concentration), and scatter plots reveal the relationship between two continuous variables.
Last updated: May 2026

Why This Section Matters

The Praxis exam does not require you to crunch large data sets, but it does demand that you interpret standard descriptive and inferential statistics and select the right graph type for a data set. Questions appear both in the Nature of Science subarea and embedded in content questions across genetics, ecology, and physiology.

Descriptive Statistics

For a small data set, the measures of central tendency are:

  • Mean = sum of values / number of values. Most useful for symmetric, continuous data.
  • Median = middle value when data are ordered (or average of the two middle values). Robust to outliers.
  • Mode = most frequent value. Useful for categorical data (e.g., most common eye color).

The measures of spread are:

  • Range = maximum minus minimum.
  • Variance (s^2) = average squared deviation from the mean.
  • Standard deviation (s) = square root of variance. Reports spread in the original units.

Worked Example

Seven leaf lengths in cm: 6, 7, 7, 8, 9, 10, 14.

  • Mean = 61 / 7 = 8.71 cm
  • Median = 4th value = 8 cm
  • Mode = 7 cm (the only repeated value)
  • Range = 14 - 6 = 8 cm
  • Standard deviation = approximately 2.69 cm

Notice that the outlier (14 cm) pulls the mean higher than the median. When a distribution is skewed, the median is a more representative "typical value."

Error Bars

Error bars on a graph represent variability — either standard deviation, standard error, or 95% confidence interval. The label matters:

  • Bars of +/- 1 standard deviation show how spread out individual data points are.
  • Bars of +/- 1 standard error of the mean (SEM) show how precisely the mean is estimated; SEM = s / sqrt(n) and shrinks with larger sample size.
  • Overlapping error bars between two groups roughly suggest no significant difference, but a formal test (t-test) is needed to confirm.

Inferential Statistics for the Praxis

The p-value

A p-value is the probability of observing a result at least as extreme as the one obtained, assuming the null hypothesis (no effect) is true.

  • p < 0.05 -> conventionally statistically significant; we reject the null hypothesis.
  • p > 0.05 -> we fail to reject the null. (Note: failing to reject is not the same as proving no effect.)

A p-value is not the probability that the treatment works, nor the size of the effect. Praxis stems frequently include a wrong answer that says exactly that.

The t-test

A two-sample t-test compares the means of two groups on a continuous measurement (heights, enzyme rates, heart rates). Output is a t-statistic and a p-value. Use a t-test when:

  • Data are continuous.
  • You have two groups (e.g., control vs. treatment).
  • Data are approximately normally distributed.

For more than two groups, use ANOVA.

The chi-square goodness-of-fit test

A chi-square (X^2) goodness-of-fit test compares observed counts to counts expected under a hypothesis — perfect for testing Mendelian ratios.

X^2 = Sum [ (O - E)^2 / E ]

where O is observed count and E is expected count.

Example: A monohybrid cross predicts a 3:1 ratio of yellow : green peas. Out of 80 offspring you observe 58 yellow and 22 green. Expected: 60 yellow, 20 green.

X^2 = (58 - 60)^2 / 60 + (22 - 20)^2 / 20 = 4/60 + 4/20 = 0.067 + 0.200 = 0.27.

With 1 degree of freedom, the critical X^2 value at p = 0.05 is 3.84. Since 0.27 < 3.84, we fail to reject the 3:1 hypothesis — the observed data are consistent with Mendel's prediction.

Correlation vs. Causation

Correlation measures whether two continuous variables move together.

  • Pearson r ranges from -1 to +1.
  • r near +1 = strong positive correlation; r near -1 = strong negative correlation; r near 0 = no linear relationship.
  • The square of r, r^2 (coefficient of determination), tells you the proportion of variance in one variable explained by the other.

Causation means changes in one variable directly produce changes in the other. Correlation alone never proves causation, because of:

  1. Reverse causation - the effect actually causes the supposed cause.
  2. Confounding variables - a third factor causes both.
  3. Chance - a coincidence in the sample.

The only way to establish causation is through a controlled experiment with random assignment.

Choosing the Right Graph

GraphUse WhenExample
Bar graphComparing values across discrete categoriesMean leaf length by tree species
HistogramShowing the distribution of a single continuous variableFrequency of leaf lengths across all trees
Line graphShowing change in one variable over a continuous independent variable (often time)Population size over 10 years; enzyme activity vs. temperature
Scatter plotShowing the relationship between two continuous variablesHeight vs. weight; CO2 vs. photosynthesis rate
Pie chartShowing parts of a whole for a single categoryComposition of a community by phylum
Box-and-whisker plotComparing distributions including median and IQR across groupsTest scores by classroom

Interpreting Trends

  • A plateau on a rate curve indicates that another factor has become limiting (Liebig's Law of the Minimum).
  • A sigmoidal (S-shaped) curve is typical of logistic population growth.
  • A bell-shaped (normal) distribution results from many small independent factors and underlies most parametric tests.
  • An exponential curve describes unconstrained growth or first-order decay (radioactive isotopes used in dating).

Praxis Trap to Avoid

When a question reports a correlation in an observational study (e.g., "students who eat breakfast score higher on tests"), the answer that calls it proof of causation is wrong. The answer that flags a possible confound (socioeconomic status, parent involvement) is the scientifically correct one.

Test Your Knowledge

In a study of leaf widths from two oak species, Species A has a mean width of 6.5 cm (SD = 0.3 cm, n = 30) and Species B has a mean width of 6.8 cm (SD = 0.4 cm, n = 30). A two-sample t-test returns p = 0.002. Which conclusion is best supported?

A
B
C
D
Test Your Knowledge

A class crosses heterozygous purple-flowered pea plants (Pp x Pp) and counts 168 purple and 72 white offspring among 240 F2 plants. They want to test whether the data fit Mendel's predicted 3:1 ratio. Which statistical test is the appropriate choice and what are the expected counts?

A
B
C
D