4.1 Hallucinations: detection & mitigation

Key Takeaways

  • A hallucination is fluent, confident output that is false or not grounded in a reliable source or the provided context.
  • LLMs hallucinate because they predict the next token for plausibility, not truth, and hold no internal ground-truth database.
  • Types include factual, faithfulness/context-conflicting, intrinsic (contradicts the input) and extrinsic (unverifiable additions).
  • Detection is verification: grounding checks, fact verification, self-consistency, retrieval overlap, and NLI/LLM-as-judge entailment.
  • Mitigation is layered: RAG grounding, citations, lower temperature, guardrails, and human-in-the-loop; the rate is measured, not eliminated.
Last updated: July 2026

What a hallucination is

In the CT-GenAI syllabus a hallucination is output from a generative AI model that is fluent, confident, and well-formed, yet is factually false, unsupported, or not grounded in a reliable source or in the context that was provided. Because a large language model (LLM) predicts the next token from statistical patterns rather than looking a fact up, it will happily produce plausible-sounding text even when it has no basis for the claim. For a tester the real danger is this very fluency: a hallucinated answer reads as authoritatively as a correct one, so it can never be caught by tone, grammar, or the model's own confidence alone — it has to be verified against evidence. A classic example is an LLM that cites a court case, an academic paper, or a software API method that does not exist, complete with a convincing but entirely invented reference.

Why LLMs hallucinate

An LLM is trained to maximise the probability of the next token given the preceding sequence; it therefore optimises for plausibility, not truth. Several factors drive hallucination:

  • The model holds no internal ground-truth database — only learned statistical correlations between tokens.
  • Training data contains gaps, errors, and bias, and the model's knowledge is stale after the training cut-off date.
  • Ambiguous, under-specified, or leading prompts push the model to "fill in" the missing detail.
  • Decoding settings such as a high temperature reward creative, less-constrained, and less-grounded output.
  • The model is biased toward always producing an answer and rarely replies "I don't know."

Types of hallucination

The syllabus classifies hallucinations by what the output conflicts with:

TypeConflicts withExample
FactualReal-world factsInvents a non-existent ruling or a wrong statistic
Faithfulness / context-conflictingThe supplied source or contextA summary asserts something the source document never said
IntrinsicThe input itselfThe output contradicts a fact stated in the same prompt
ExtrinsicAnything verifiableAdds detail that cannot be confirmed from the input or source

A faithfulness (or context-conflicting) hallucination matters most for Retrieval-Augmented Generation (RAG) and summarisation, where the model is meant to stay strictly inside the supplied text but drifts beyond it. Separating an intrinsic hallucination (which contradicts the given input) from an extrinsic one (an unverifiable addition) is a distinction the exam frequently probes.

Test Your Knowledge

Which description best matches the CT-GenAI definition of a hallucination?

A
B
C
D

Detecting hallucinations

Detection is fundamentally a verification problem: you compare the output against a trusted reference rather than trusting the text on its face. The techniques the syllabus expects you to recognise include:

  • Grounding / source checks — confirm that every claim can be traced back to the supplied context or an authoritative document.
  • Fact verification — check named facts, figures, dates, and citations against reliable external sources or a knowledge base.
  • Self-consistency — generate the answer several times, or in several ways, and flag claims that are unstable across runs; genuine facts tend to stay consistent while fabrications vary.
  • Retrieval overlap — measure how much of the answer is actually supported by the retrieved passages; low overlap is a strong signal of ungrounded content.
  • LLM-as-a-judge / NLI entailment — use a second model, or a natural-language-inference check, to test whether the output is logically entailed by the source text.

A single technique is rarely enough. Grounding checks tell you whether a claim is supported, but not whether the supporting source is itself correct; fact verification catches wrong figures but is expensive to automate at scale; and self-consistency exposes instability yet can miss a fabrication the model repeats confidently on every run. Mature test strategies therefore combine several signals and set an explicit acceptance threshold — for example, routing any response whose retrieval overlap falls below a defined level to human review.

Because generative output is non-deterministic, testers verify properties — is it grounded? is it internally consistent? — instead of demanding one exact expected string, and they sample many outputs rather than judging a single pass.

Test Your Knowledge

A RAG-based summariser produces a summary asserting something the source document never contained. Which type of hallucination is this?

A
B
C
D

Mitigating hallucinations

No single control removes hallucination, so the syllabus expects a layered, defence-in-depth response:

  • Grounding with RAG — supply authoritative retrieved context and instruct the model to answer only from it, reducing reliance on parametric memory.
  • Require citations — make the model cite the passage that supports each claim, so unsupported statements become visible and traceable.
  • Lower the temperature — more deterministic decoding reduces invented detail on fact-oriented tasks.
  • Guardrails and output validation — automatically check each response against rules, a schema, or a verifier before it reaches the user.
  • Human-in-the-loop review — keep an accountable person in the loop for high-risk outputs such as medical, legal, or financial content.
  • Prompt engineering — explicitly permit the model to answer "I don't know" and constrain the scope of the request.

Choosing mitigations is a trade-off. Lowering the temperature and tightening RAG grounding cut hallucinations, but can make answers terse or cause the model to refuse borderline-yet-valid questions, hurting usefulness. Adding human review raises accuracy while also increasing latency and cost. Testers help the team find the right balance by measuring the hallucination rate before and after each control, checking that user-facing helpfulness does not regress, and confirming that the residual risk is acceptable for the system's risk level.

The tester's job is to design test cases that deliberately expose ungrounded output, to quantify a hallucination rate as a measurable quality metric, and to confirm that each mitigation measurably lowers that rate. Hallucination is a risk to be managed, measured, and reduced — it is never fully eliminated.

Test Your Knowledge

Which technique detects hallucinations by generating the answer several times and flagging claims that change across runs?

A
B
C
D