2.3 Pretest Items and Exam Development

Key Takeaways

  • Part 1 includes 50 unscored pretest items mixed among the 225 total (175 scored).
  • Part 2 includes 40 unscored pretest items mixed among the 170 total (130 scored).
  • Pretest items are statistically evaluated for future forms and are indistinguishable from scored items.
  • Because pretest items are not flagged, answer every visible item seriously and pace by total visible items.
Last updated: June 2026

Pretest items are invisible during the exam

Both EPPP parts embed pretest items — questions that do not count toward your score but are being statistically evaluated for possible use on future forms. ASPPB tracks each pretest item's difficulty (p-value) and discrimination across many candidates; items that perform well graduate into future scored pools, and weak items are revised or retired. Pretest items are scattered among the visible items and are never flagged, so the only sound behavior is to answer every item with full effort.

Memorize the counts as complete pairs, not just totals. Stating only "225" or "170" invites bad pacing and bad score interpretation.

Exam partTotal visible itemsScoredPretestTimeCandidate strategy
Part 1-Knowledge225175504 h 15 minSteady pace; sustain stamina across breadth
Part 2-Skills170130404 h 10 minRead scenarios carefully; reserve judgment time

Pretest items can feel no different from scored items — a familiar item may be pretest, and a brutal item may be fully scored. A candidate who tries to guess which count is wasting cognitive energy and risks careless errors on items that actually matter. The smartest assumption is that every item counts, because functionally, you can never tell.

Understanding why pretest items exist removes anxiety about them. ASPPB cannot put a brand-new question directly into the scored pool, because its statistical behavior is unknown. By embedding it as a pretest item answered by thousands of candidates, ASPPB measures two things: the item's difficulty (the proportion who answer correctly, its p-value) and its discrimination (whether high-ability candidates outperform low-ability candidates on it, often via a point-biserial correlation).

An item that is too easy, too hard, ambiguous, or that strong candidates miss while weak candidates pass gets revised or discarded before it ever counts. This is the same field-testing logic used by the SAT, NCLEX, and bar exams, and it is the reason every operational EPPP form stays psychometrically sound across years.

A common myth deserves correcting: pretest items are not concentrated at the end of the exam, and they are not the strangely worded ones. They are interspersed pseudo-randomly throughout the form precisely so that candidate behavior on them mirrors behavior on scored items — which is what makes the statistics valid. Trying to detect them by oddness is both futile and counterproductive.

A second myth is that pretest items are "easier" or "harder" as a class; in fact their difficulty is unknown by design — that is the very thing the field test is measuring — so an item that strikes you as bizarre is just as likely to be a perfectly valid scored item that simply tests an unfamiliar corner of the content outline.

Pretest design also explains why post-exam memory is unreliable. Candidates disproportionately remember the hardest, strangest, or most emotionally charged items — and some of those may have been pretest items that contributed nothing to the score. Reconstructing "how I did" from remembered items is therefore misleading. Trust ASPPB's official scaled-score report, not a mental highlight reel of tough questions.

For pacing, pretest items still consume time: the timer does not pause when you reach one. You must work across all 225 visible items on Part 1 and all 170 on Part 2 regardless of which count. Pace by total visible items, not by the scored subset — a candidate who budgets time for only 175 items will run long.

There is also a fairness rationale worth understanding. Because pretest items do not count, a candidate is never penalized for an experimental question that later proves flawed, ambiguous, or miskeyed — the item is evaluated and fixed before it can ever affect a real score. In effect, the 50 Part 1 and 40 Part 2 pretest items are a quality-control buffer that protects future test-takers from the very kinds of bad questions that occasionally slip into the wild.

Some candidates find it reassuring that roughly 22% of Part 1 items and 24% of Part 2 items carry zero scoring risk, which means a few baffling questions on test day genuinely may not matter; but since you cannot identify them, the correct response is calm, full effort on each.

For studying, pretest items argue against narrow item-chasing. Because the program continuously develops new forms, you cannot "memorize the test." Preparation should target durable competence — decision rules, ethics, assessment logic, intervention planning, research interpretation, cultural responsiveness, and professional judgment — which transfers across both scored and pretest items.

A useful mental script during the exam is: "I cannot know whether this item is scored, so I will answer it as well as I can and move on." This prevents two opposite errors. Overinvestment happens when one hard item devours minutes you needed elsewhere. Underinvestment happens when a candidate dismisses an unusual item as "probably pretest" and answers it carelessly. Both are avoidable.

Finally, pretest items are not a trick to fear. They are standard psychometric practice used on most large-scale licensing exams. Your job is mechanical and simple: know the complete counts, pace by total visible items, answer everything, and interpret your result through ASPPB's scaled score rather than through remembered fragments of difficult questions.

Test Your Knowledge

Which breakdown for Part 1-Knowledge is correct?

A
B
C
D
Test Your Knowledge

Why is post-exam memory a poor basis for judging EPPP performance?

A
B
C
D
Test Your Knowledge

How should pretest items shape exam-day pacing?

A
B
C
D