5.5 Test Selection, Administration, Scoring, and Interpretation

Key Takeaways

  • Test selection should follow the referral question, the evidence base, examinee characteristics, and the risk of an incorrect decision.
  • Standardized administration protects score meaning; accommodations require documentation and attention to whether the construct changes.
  • Interpretation integrates validity indicators, behavioral observations, history, records, and collateral data.
  • High-stakes assessment requires cautious language about uncertainty, limitations, and alternative explanations.
Last updated: June 2026

Choosing and Using Tests Responsibly

Test selection begins after the referral question is clear. Ask what decision must be made, what construct must be measured, how much confidence is needed, and what harm could follow an incorrect conclusion. The instrument must fit the question, the examinee, the setting, and the intended use.

The EPPP expects familiarity with major categories: individually administered cognitive tests (for example the WAIS and WISC Wechsler scales, with index scores on the standard-score metric of mean 100, SD 15), broad-band personality inventories (for example the MMPI family with its validity scales), self-report symptom inventories (such as the BDI or BAI), structured behavior-rating scales, projective measures, neuropsychological batteries, and adaptive-behavior measures.

No test is valid for every purpose. An intelligence test supports evaluation of cognitive ability but does not by itself diagnose a specific learning disorder, predict violence, determine parenting capacity, or explain cultural adaptation. A symptom inventory estimates severity but does not replace the clinical interview, functional assessment, or differential diagnosis.

Competent selection weighs language, reading level, sensory or motor disability, age, education, cultural background, medical status, fatigue, and access needs. If a test lacks appropriate norms or translation evidence, the psychologist does not pretend the score carries the same meaning. The options are to choose a better instrument, use a qualified interpreter, consult, add qualitative data, or explicitly limit conclusions.

Standardized administration protects interpretation. Instructions, timing, materials, prompts, scoring rules, and the testing environment should follow the manual unless a justified accommodation or modification applies. The EPPP distinguishes the two: an accommodation (extra breaks, large print, an accessible room, assistive technology) aims to remove a barrier without changing the construct, whereas a modification changes what is measured (for example, reading aloud a test of reading) and therefore alters score meaning. Any deviation must be documented and interpreted cautiously.

StepPsychologist's questionDocumentation focus
SelectionDoes this instrument answer the referral question?Purpose, evidence base, population fit
PreparationAre language, disability, and access needs addressed?Accommodations, interpreter use, consent
AdministrationWere standard procedures followed?Deviations, environment, behavior observed
ScoringWere scores derived correctly?Manual rules, software, quality checks
InterpretationWhat do the data support and not support?Validity, confidence, limits, integration

Validity indicators deserve special attention. Many inventories include scales for inconsistent responding, unusual symptom endorsement, defensiveness, or exaggerated distress. These are not moral judgments; they are data about response style, comprehension, fatigue, emotional state, motivation, or context, and they should be read alongside behavioral observations and collateral information. Dedicated symptom validity and performance validity tests help detect noncredible presentation, especially in forensic or disability contexts where there is incentive to distort.

Behavior during testing changes interpretation. Slow pace, impulsive responding, frequent clarification requests, pain behavior, low frustration tolerance, motor difficulty, language confusion, or sleepiness may explain scores or generate new hypotheses. Observations should be specific: 'asked for repetition of instructions on six of ten subtests' is stronger evidence than 'seemed unmotivated.'

Scoring errors are preventable but consequential. Computerized scoring does not remove responsibility. The psychologist remains accountable for verifying identifying information, the correct norm set, age calculations, missing items, protocol validity, unusual score patterns, and whether automated narrative statements actually fit the case.

Interpretation is integrative. A strong report does not list scores in isolation; it explains how scores fit the interview, mental status, history, records, collateral reports, functional impairment, and differential diagnosis. When sources conflict, the report discusses the discrepancy rather than hiding it.

Use this interpretation sequence:

  1. Begin with the referral question and the quality of the data.
  2. State major findings in plain clinical language.
  3. Link scores to observed behavior and history.
  4. Identify converging and conflicting evidence.
  5. Discuss limitations, alternative explanations, and confidence.
  6. Translate findings into recommendations matched to the referral purpose.

The EPPP usually favors a modest, evidence-based conclusion over an impressive but unsupported one. The best answer protects the examinee from test misuse while still giving the referral source useful, defensible information.

Test Categories, Scoring Models, and Examiner Competence

Knowing how different test families behave guards against category errors. Norm-referenced tests compare a person to a standardization sample, which is the model behind intelligence and personality inventories. Criterion-referenced tests compare performance to a fixed standard, such as mastery of a skill, and are common in educational and competency settings. Ipsative scoring compares a person's scores to their own profile rather than to a group, useful for identifying relative strengths and weaknesses but not for ranking against peers.

Picking a norm-referenced interpretation when the question calls for criterion-referenced mastery, or vice versa, is a classic distractor.

Projective and objective measures sit at different evidence levels, and the EPPP expects candidates to weigh that. Objective inventories with structured scoring and published validity scales generally carry stronger psychometric support than projective techniques, whose reliability and incremental validity are more contested. This does not forbid projective use; it means conclusions should rest on the better-validated data and be cross-checked.

Similarly, behavior-rating scales gain value from multiple informants across settings, because a child who looks symptomatic at school but not at home tells a different diagnostic story than one impaired everywhere.

Examiner competence is a recurring ethical-assessment crossover. Under the APA Ethics Code, psychologists administer, score, and interpret tests within the bounds of their training and the test's documented purpose; they do not use obsolete versions or norms when current ones exist, and they retain responsibility for computer-generated interpretations. When a vignette describes a clinician using a test outside its validated population, relying on an outdated edition, or accepting an automated narrative without review, the flaw is competence and standardization, and the correct answer addresses that flaw rather than the surface result.

Finally, anchor interpretation to the purpose of the evaluation. A screening instrument is built to maximize sensitivity and tolerate false positives, so it is appropriate for casting a wide net but not for confirming a diagnosis. A diagnostic battery aims for specificity and depth. Matching the instrument's design to the decision at hand, and stating the resulting confidence honestly, is the integrative skill the EPPP is measuring across this section.

Test Your Knowledge

A psychologist wants to use a test that has no appropriate norms for the examinee's language background. What is the best response?

A
B
C
D
Test Your Knowledge

Which difference between an accommodation and a modification is most important for score interpretation?

A
B
C
D
Test Your Knowledge

What should a psychologist do when standardized test scores and collateral records conflict?

A
B
C
D