8.1 Assessment and Testing Overview

Key Takeaways

Assessment (the CACREP "Appraisal" area) is one of 8 equally weighted CPCE domains: 20 items, 17 scored, so 12.5% of the 136 scored questions.
Reliability = consistency of scores; validity = whether the test measures what it claims and supports the intended interpretation.
A test can be reliable without being valid, but it cannot be valid without first being reliable.
Standardized scores (z, T, standard score, percentile, stanine) all locate a person relative to a norm group; learn the conversions cold.

Last updated: June 2026

8.1 Assessment and Testing Overview

This chapter covers the CACREP common-core area officially titled Appraisal of Individuals and Groups (often labeled "Assessment and Testing"). On the Counselor Preparation Comprehensive Examination (CPCE), the test has 160 multiple-choice items split into 8 content areas of 20 items each. Within each area only 17 items are scored and 3 are unscored pretest items, so the scored exam is 136 questions. Appraisal therefore contributes 17 scored items (12.5%). There is no single national pass score; each program sets its cut, commonly near one standard deviation below the national mean.

Testing time is about 3 hours 45 minutes, and the typical fee is $150.

What this domain actually tests

Appraisal is not vague "professional judgment." It is concrete measurement knowledge: defining and computing score types, distinguishing reliability from validity, knowing what each named instrument measures, and applying ethical standards for test use. Expect questions that give a number (a z-score, a percentile, a reliability coefficient) and ask what it means, or that name an instrument and ask what category it belongs to.

The two pillars: reliability and validity

Concept	Question it answers	Key indicator
Reliability	Are scores consistent and repeatable?	Reliability coefficient (0 to 1); .80+ is acceptable, .90+ for high-stakes
Validity	Does the test measure the right thing and support the decision?	Evidence from content, criterion, and construct sources

A bathroom scale that reads 5 lb too high every time is perfectly reliable (consistent) but not valid (wrong value). This is the single most tested relationship in the domain: reliability is necessary but not sufficient for validity. A test must be reliable to be valid, but reliability alone never guarantees validity.

Norm-referenced versus criterion-referenced

Another distinction the CPCE tests directly is how scores are interpreted. A norm-referenced test compares a person to a norm group (a sample meant to represent the population): the WAIS, SAT, and most standardized batteries work this way, and the score answers "how does this person rank against peers?" A criterion-referenced test compares performance to a fixed standard or cutoff without reference to others, answering "did the person meet the benchmark?" A licensure exam with a pass score is criterion-referenced; a percentile-based aptitude test is norm-referenced.

Watch for stems that describe a mastery cutoff (criterion) versus a ranking against others (norm).

Standardization and the norming sample

A test is standardized when administration, scoring, and interpretation follow uniform procedures, and when a representative standardization (norming) sample establishes the score distribution. The quality of any norm-referenced interpretation depends on whether the norming sample matches the test-taker on relevant variables such as age, region, and demographics. When the sample is unrepresentative or outdated, even a reliable score can produce misleading interpretations, which is why the recency and relevance of norms is a recurring exam concern.

Types of reliability

Test-retest — same test, same people, two occasions; estimates stability over time. Threatened by practice effects and real change between sittings.
Internal consistency — how well items hang together within one administration; measured by coefficient alpha (Cronbach's alpha) or, for split halves, the Spearman-Brown corrected correlation.
Alternate (parallel) forms — two equivalent test versions; controls for memory of specific items.
Inter-rater — agreement between scorers, critical for projective tests and observation scales; reported as a correlation or Cohen's kappa.

Types of validity evidence

Content validity — do the items adequately sample the whole domain? Established by expert review, not statistics.
Criterion-related validity — does the test correlate with an outcome? Concurrent (criterion measured now) versus predictive (criterion measured later, e.g., SAT predicting freshman GPA).
Construct validity — does the test measure the abstract trait it claims? Supported by convergent evidence (correlates with related measures) and discriminant evidence (does not correlate with unrelated ones).

Standard error of measurement

The standard error of measurement (SEM) estimates how much an observed score would vary on retesting: as reliability rises, SEM falls. A counselor reports a confidence band (score plus or minus SEM) rather than a single point, because no score is error-free. Recognizing that the SEM links reliability to score interpretation is a frequent exam target.

Classical test theory in one line

Under classical test theory, every observed score = true score + error. Reliability is the proportion of observed-score variance that is true-score variance; error is everything unsystematic. This is why improving reliability (more items, clearer wording, trained raters) shrinks the SEM and tightens the confidence band. Random error lowers reliability and is unpredictable; systematic error (bias) does not lower reliability but does threaten validity because it shifts scores consistently in one direction.

Factors that raise or lower reliability

Test length — adding well-written items generally raises internal consistency (the logic behind the Spearman-Brown formula).
Item quality and clarity — ambiguous items add random error.
Sample heterogeneity — a wider range of true ability inflates reliability estimates; a restricted range deflates them.
Scoring objectivity — objective scoring (machine-scored multiple choice) is more reliable than subjective scoring, which depends on inter-rater agreement.

Keep these levers straight: if a stem says a test was lengthened or items were clarified, expect reliability (and validity) to improve; if it describes a narrow, homogeneous sample, expect a deflated reliability coefficient.

Test Your Knowledge

A new anxiety scale produces nearly identical scores when the same clients retake it a week later, but its scores show no relationship to any established anxiety measure or to clinical diagnosis. The scale is best described as:

Valid but not reliable

Neither reliable nor valid

Both reliable and valid

Reliable but not valid

Test Your Knowledge

Which type of validity is established primarily through expert judgment that the test items adequately sample the entire content domain, rather than through a correlation coefficient?

Predictive validity

Concurrent validity

Content validity

Convergent validity

Up Next

8.2 Score Types and Statistical Foundations

Continue learning

CPCE Study Guide

CPCE Counselor Preparation Comprehensive Examination

8.1 Assessment and Testing Overview

Key Takeaways

8.1 Assessment and Testing Overview

What this domain actually tests

The two pillars: reliability and validity

Norm-referenced versus criterion-referenced

Standardization and the norming sample

Types of reliability

Types of validity evidence

Standard error of measurement

Classical test theory in one line

Factors that raise or lower reliability

CPCE Study Guide

1Chapter 1: CPCE Orientation and Exam Strategy

2Chapter 2: Professional Counseling Orientation and Ethical Practice

3Chapter 3: Social and Cultural Diversity

4Chapter 4: Human Growth and Development

5Chapter 5: Career Development

6Chapter 6: Counseling and Helping Relationships

7Chapter 7: Group Counseling and Group Work

8Chapter 8: Assessment and Testing

9Chapter 9: Research and Program Evaluation

10Chapter 10: Final Review and Test Day

CPCE Counselor Preparation Comprehensive Examination

8.1 Assessment and Testing Overview

Key Takeaways

8.1 Assessment and Testing Overview

What this domain actually tests

The two pillars: reliability and validity

Norm-referenced versus criterion-referenced

Standardization and the norming sample

Types of reliability

Types of validity evidence

Standard error of measurement

Classical test theory in one line

Factors that raise or lower reliability