4.3 Prompt Templates & Chains

Key Takeaways

  • The prompt-building step (a prompt template) combines the user's question and retrieved passages into the model-ready input; an output parser runs later, after the model responds.
  • A chain is a fixed sequence of steps (retrieve, build prompt, call model, parse) and is the right fit for predictable RAG flows.
  • Place retrieved context near the question with clear delimiters and instruct grounding; inject user fields and source metadata directly into the prompt the model sees.
  • Requesting a fixed output schema plus validation makes downstream parsing reliable and stabilizes pipelines that intermittently omit required JSON keys.
  • Break multi-step tasks into explicit steps or chain-of-thought rather than forcing a single-shot answer.
Last updated: July 2026

Prompt Templates: The Assembly Point

A prompt template is a reusable scaffold with placeholders that the application fills at run time. In a retrieval-augmented chain it is the prompt-building step that combines the user's question with the retrieved passages to form the final model-ready input. This is a frequently tested distinction: the component that merges question and context is the prompt template, not the embedding model (which turns text into vectors upstream), not a secret scope (which stores credentials), and not an output parser. The output parser runs later in the pipeline, after the model has already produced a response, to turn that text into a structured object.

Two layers of prompt matter. The system prompt sets persistent behavior — role, rules, output format, safety constraints — that should hold across turns. The user prompt carries the per-request instruction plus the retrieved context. Put stable instructions in the system prompt so they are not accidentally overridden by user input or by injected context.

Passing Retrieved Context Well

How you place context inside the template strongly affects grounding. The practices the exam rewards:

  • Delimit and position: place retrieved context close to the question with clear delimiters, and instruct the model to ground its answer in it. Dumping passages after the question with no delimiters, or shuffling them randomly each call, weakens the model's ability to use the evidence.
  • Ground and abstain: instruct the model to answer only from the provided context and to say it does not know when the answer is unsupported.
  • Carry source metadata: pass each chunk's source alongside its text and instruct the model to cite it, so answers include verifiable citations.
  • Inject user fields in the prompt: values like account tier or product SKU shape the answer only when they appear in the prompt the model sees; storing them only in index metadata helps retrieval but never reaches generation.

When retrieved passages overflow the context budget, rank and select the top passages (optionally summarizing) rather than concatenating everything or truncating by position.

Multi-Step Chains

A chain is a fixed sequence of calls whose control flow is known at design time. A prompt workflow that always runs retrieve context, build prompt, call model, return answer is a textbook chain — easier to reason about, test, and optimize than an autonomous agent, and the right choice when no dynamic tool selection or looping is required. Use an agent only when the path depends on intermediate results; do not graduate to one when a deterministic chain solves the problem.

Modern LangChain expresses chains compositionally, piping a prompt template into a model into an output parser (prompt | llm | parser), with a passthrough carrying the original question alongside the retrieved context. The mental model is a pipeline of typed steps, each independently testable. Separating retrieval, prompt assembly, generation, and post-processing into explicit components also lets you measure and swap each stage independently, which is why monolithic prompts are discouraged — a regression could come from any hidden stage with no clear boundary.

For tasks that fail in one shot, decompose the reasoning: ask the model to work step by step (chain-of-thought) or split the task into intent classification, retrieval, and answer generation so you can reserve an expensive model only for the step that needs it. When only a few labeled examples exist, few-shot prompting (showing input-output pairs) steers format and behavior most effectively without any training.

Output Parsers and Structured Responses

The output parser is the final stage that converts the model's raw text into something downstream code can use — a string, a JSON object, or a validated typed record. The dominant exam theme is structured output: when a downstream workflow expects every response to contain fixed fields (for example status, priority, and owner, or extracted invoice fields for SQL joins), ask the model for a fixed schema rather than free-form prose. A schema gives downstream code something predictable to validate and consume; it makes integration far more robust than parsing prose, though it does not guarantee zero hallucinations or always reduce token cost.

When an extraction pipeline intermittently omits required JSON keys, the fix is schema-constrained output with validation, which catches malformed responses before they break downstream logic — more reliable than hoping prompt wording alone keeps formatting consistent. Pair structured output with an offline JSON-parseable / schema-conformance metric so you measure how often responses actually validate.

ComponentRoleWhen it runs
Prompt templateAssemble question + context + user fieldsBefore the model call
ChainSequence the fixed steps deterministicallyOrchestrates the whole flow
Output parserConvert response text to a structured objectAfter the model call

A Worked Example: Support-Ticket Router

Consider a chain that routes support tickets. The retriever pulls similar past tickets; the prompt template injects those examples with delimiters, states the routing rules, and lists the exact fields required; ChatDatabricks generates; and a JSON output parser returns {status, priority, owner}. Because a downstream queue expects those three machine-readable fields, the template demands a fixed schema and the parser validates it, so a malformed row is caught rather than silently corrupting the queue. If the model sometimes emits prose alongside the JSON, an explicit return only valid JSON instruction plus schema validation stabilizes it far better than rewording the tone. Each stage is measurable in isolation: retrieval relevance, schema-conformance rate, and routing accuracy are tracked separately so you can localize a regression instead of guessing which step failed.

Finally, treat prompts as governed assets: Databricks can version prompts in Unity Catalog, promote them with aliases, and let subject-matter experts edit them without code changes, so prompt iteration follows the same controlled release path as models and indexes.

Test Your Knowledge

In a retrieval-augmented chain, which component combines the user's question and the retrieved passages into the final input sent to the model?

A
B
C
D
Test Your Knowledge

A contract-extraction workflow occasionally omits required JSON keys, breaking downstream code. Which change is most likely to stabilize the pipeline?

A
B
C
D
Test Your Knowledge

A prompt workflow always follows the same sequence: retrieve context, build a prompt, call the model, and return an answer. Which orchestration style fits best?

A
B
C
D