4.4 Training Evaluation and Transfer

Key Takeaways

  • Kirkpatrick's four levels — Reaction, Learning, Behavior, Results — structure how HR proves training worked.
  • Phillips's Level 5 adds ROI: comparing the monetary value of results to fully loaded program cost.
  • Transfer of training requires employees to apply learning on the job, supported by managers, tools, feedback, and expectations.
  • When satisfaction is high but performance is unchanged, HR examines learning, transfer, and work barriers — not whether to repeat the same class.
Last updated: June 2026

Measuring Whether Training Worked

Training evaluation asks whether the learning effort achieved its purpose. A course can be popular and still fail to change performance; a course can earn mixed reactions and still build a hard but necessary skill. HR chooses evaluation methods based on the original objective and the risk or cost of the program.

The dominant framework is the Kirkpatrick four-level model, often paired with Jack Phillips's Level 5 (ROI):

LevelQuestion AnsweredTypical EvidenceLimitation
1 — ReactionDid participants find it relevant and engaging?Smile-sheet surveys, commentsSatisfaction is not skill
2 — LearningDid they gain the KSAs?Pre/post tests, demonstrations, simulationsMay not prove job transfer
3 — BehaviorAre they using it on the job?Observation, manager review, work samplesRequires follow-up weeks later
4 — ResultsDid business metrics improve?Error rates, cycle time, compliance, safety, serviceOther factors can affect outcomes
5 — ROI (Phillips)Did monetary benefit exceed cost?(Net program benefits / costs) x 100Hardest to isolate and quantify

ROI percentage = (Net program benefits ÷ fully loaded program costs) × 100. A program returning $150,000 in benefits against $50,000 in costs yields a net benefit of $100,000 and an ROI of 200%. The exam may simply expect you to recognize Level 5 as the monetary comparison and to know that most organizations evaluate only Levels 1 and 2 because Levels 3-5 take more effort.

The best evaluation plan is built during ADDIE Design, not after delivery. If the objective says employees will apply a leave-intake checklist, evaluation should review actual intake records (Level 3). If the objective promises fewer payroll errors, evaluation tracks the error metric (Level 4).

Transfer of Training

Transfer of training means employees actually use what they learned when they return to work. Positive transfer improves performance; negative transfer occurs when training interferes with correct on-the-job behavior. Transfer depends on far more than the class: employees need time, tools, system access, supervisor expectations, feedback, and a work environment that supports the new behavior. The single biggest predictor of transfer is manager reinforcement before and after training.

Practical actions that drive transfer:

  • Have managers explain why the behavior matters and what they expect.
  • Provide job aids or checklists at the point of work.
  • Build practice around real, recognizable scenarios.
  • Schedule follow-up coaching, refreshers, or booster sessions.
  • Remove workflow barriers that block the new behavior.
  • Review work samples or metrics after implementation.

PHR scenarios frequently ask what to do when reaction scores are high but performance has not improved. The correct response compares the objective with Level 3 and Level 4 evidence, checks whether the design included practice, and examines whether managers and systems support transfer — it is not to simply repeat the same class. Evaluation also informs future investment: HR can revise content, change delivery, retarget the audience, improve job aids, or recommend a non-training fix. The strongest answer treats evaluation as part of the learning cycle and uses evidence to make the next effort more effective.

Matching Evidence to the Question, and Isolating Effects

The practical skill the exam tests is matching the evidence to the question being asked. Satisfaction data improves the learner experience but cannot answer a performance question; pre/post tests prove knowledge but not job use; only observation or work-sample review proves behavior, and only metrics prove results.

  • Use tests or demonstrations for learning (Level 2) evidence.
  • Use observation or work samples for behavior (Level 3) evidence.
  • Use operational metrics when the objective promised a business improvement (Level 4).
  • Reserve ROI (Level 5) for expensive, high-visibility programs where leaders demand a dollar return.

A subtle Level 4 problem is attribution: if errors dropped after training, how do you know training caused it rather than a new system, a staffing change, or seasonality? Designs that strengthen attribution include a control group (a comparable group that did not receive training), pre- and post-measures on the trained group, and trend lines that isolate the change to the training window. The PHR does not require statistical modeling, but it expects you to recognize that confounding factors can muddy results and that a control or baseline comparison strengthens the claim.

Formative vs. Summative, and the Feedback Loop

Evaluation is not only an end-of-course event. Formative evaluation happens during design and pilot to improve the program before full rollout (catching confusing content or weak practice). Summative evaluation happens after delivery to judge overall effectiveness and value. Both matter: formative evaluation prevents a flawed program from scaling; summative evaluation decides whether to repeat, revise, retarget, or retire it.

Whatever the level, evaluation closes the ADDIE loop by feeding back into the next Analyze phase. Concrete decisions evaluation should drive: revise content that learners failed on the post-test; change delivery if learning occurred but behavior did not transfer; retarget the audience if the wrong people were trained; strengthen job aids and manager reinforcement when transfer is weak; or recommend a non-training intervention if results show the real barrier was never a KSA gap. The recurring exam wrong answer is "run the same class again" when the data points to a transfer or environment problem.

The recurring right answer diagnoses which level failed and fixes that specific link in the chain, then re-evaluates to confirm the fix worked.

Test Your Knowledge

A leadership course earns a 4.8/5 satisfaction rating, yet managers still document discipline incorrectly. Under Kirkpatrick's model, what should HR examine next?

A
B
C
D
Test Your Knowledge

A training program produced $150,000 in net benefits against $50,000 in fully loaded costs. Using Phillips's Level 5 formula, what is the approximate ROI?

A
B
C
D
Test Your Knowledge

Which factor is the single biggest predictor of whether learning transfers to the job?

A
B
C
D