A claims-operations app must answer policy questions, retrieve supporting documents, and create a follow-up task in another system only when needed. Which architecture best fits this use case?

A tool-calling agent with a retrieval tool and a workflow tool. The app must both retrieve documents and take an action (create a task) conditionally, which requires a tool-calling agent that can decide when to invoke the workflow tool. A prompt-only bot cannot retrieve, and a plain RAG chain has no mechanism to create the follow-up task.

When decomposing a GenAI application, which step is usually best handled deterministically before any LLM call?

Validating the request and routing it based on a known field. Validation and routing on a known field are cheaper, faster, and more reliable as plain Python or SQL than as an LLM call, so they should run before the model is invoked. Answer generation and summarization are genuinely generative steps that require the LLM.

Translating Business Requirements to a GenAI | Free Guide 2026

From Business Problem to GenAI Blueprint

The Design Applications domain is 14% of the Databricks Certified Generative AI Engineer Associate exam, roughly 6 of the 45 scored questions, but it anchors every other domain. A wrong architecture decision here wastes work in Data Preparation, Application Development, and Deployment. The exam hands you a plain-English business requirement, such as 'answer HR policy questions with citations' or 'extract invoice fields into a table', and asks you to translate it into a concrete pipeline of model tasks, components, inputs, outputs, and measurable success criteria. You are graded on judgment, not memorization.

Step 1 - Decompose the use case into subtasks

Databricks expects you to break a use case into discrete subtasks instead of aiming one giant prompt at a single model. A typical enterprise assistant decomposes into input validation, intent classification or routing, retrieval, answer generation, and post-processing such as formatting and guardrails. Decomposition makes a system easier to debug and cheaper to operate, because you can inspect and fix each stage in isolation and send each stage to the smallest model that can do the job.

A heavily tested pattern: handle deterministic work deterministically, before any LLM call. Input validation, routing on a known field, exact-match lookups, and schema checks are cheaper, faster, and far more reliable as ordinary Python or SQL than as an LLM call. Reserve the model for genuinely generative or reasoning steps. When a question asks which step is best handled deterministically before calling an LLM, the answer is almost always validation, routing, or a lookup, not the generation step.

Subtask	Typical implementation	LLM needed?
Input validation / schema check	Python / SQL	No
Intent classification / routing	Small cheap model or classifier	Sometimes
Retrieval	Vector Search + metadata filters	No (uses embeddings)
Answer generation	Foundation model	Yes
Formatting / guardrails	Code plus light LLM	Sometimes

Step 2 - Choose the architecture pattern

This is the single most tested judgment in the domain. Learn this decision table cold:

Signal in the requirement	Best pattern	Why
Only formatting, tone, or style; no external facts	Prompt-only	No training or retrieval required
Proprietary or frequently changing knowledge, with citations	RAG	Retrieval injects fresh, governed context at query time
A new behavior, skill, tone, or output format baked in	Fine-tuning	Behavior is baked into weights via LoRA or QLoRA
Multiple steps, external tools, or taking real actions	Agent	Tool calling and planning across systems

RAG (Retrieval-Augmented Generation) is the default when the knowledge is private, changes often, and the answer must cite sources. An HR assistant answering from policy documents that change every week is a textbook RAG case: retrieval always uses the latest indexed documents, and citations fall out naturally. Fine-tuning would freeze last week's policy into the weights and cannot cite; hardcoding answers into the system prompt does not scale and goes stale immediately.

Fine-tuning earns its place when you need to change how the model behaves, such as a consistent tone, a domain style, or a specialized classification skill, not to add facts. The recurring trap is 'RAG adds knowledge; fine-tuning shifts behavior.' If a scenario says 'needs the newest data' or 'must cite', pick RAG, never fine-tuning.

Agents fit when the app must take multiple steps or reach into external systems. A claims-operations app that answers policy questions, retrieves supporting documents, and creates a follow-up task in another system only when needed calls for a tool-calling agent with a retrieval tool and a workflow tool. A prompt-only bot cannot retrieve, and a plain RAG chain cannot create the task.

Step 3 - Prefer the simplest architecture that works

Databricks rewards choosing the least complex pattern that satisfies the requirement, because extra machinery adds latency, cost, and failure modes. If an assistant mostly answers policy questions and only rarely needs a live value, such as an employee's current PTO balance, the best design is RAG for the common path plus a single tool for the occasional lookup, not a fully general multi-tool agent.

Chains versus agents is a favorite comparison. A chain runs a fixed sequence (retrieve, read, answer). An agent decides dynamically which tools to call. When a FAQ assistant has a strict retrieve-read-answer flow and a tight latency SLA, a chain is preferable because its path is deterministic, predictable, and faster; an agent's dynamic tool selection adds unpredictable extra LLM turns and latency. Reach for an agent only when the flow genuinely branches across tools.

Step 4 - Define inputs, outputs, constraints, and success criteria

Before writing code, write down the pipeline specification: the expected inputs, the required outputs (including exact schema when structured), the constraints (latency SLA, cost ceiling, governance and PII rules), and the success criteria. When a question asks which artifact to produce first when translating a use case, the answer is the input/output specification, not a prompt or a model choice, because the spec drives every later decision.

Success criteria must be measurable. 'Helpful' is not testable; 'at least 90% of answers grounded in retrieved context with a citation, p95 latency under 3 seconds, cost under a set amount per 1,000 queries' is. These criteria feed directly into the Evaluation and Monitoring domain later, so making them explicit up front is what separates a designed application from a demo.

Databricks Generative AI Engineer Associate Certification

Databricks Generative AI Engineer Associate

2.1 Translating Business Requirements to a GenAI Design

Key Takeaways

From Business Problem to GenAI Blueprint

Step 1 - Decompose the use case into subtasks

Step 2 - Choose the architecture pattern

Step 3 - Prefer the simplest architecture that works

Step 4 - Define inputs, outputs, constraints, and success criteria

Databricks Generative AI Engineer Associate Certification

1Introduction & Exam Strategy

2Design Applications

3Data Preparation

4Application Development

5Assembling & Deploying Applications

6Governance, Evaluation & Monitoring

Databricks Generative AI Engineer Associate

2.1 Translating Business Requirements to a GenAI Design

Key Takeaways

From Business Problem to GenAI Blueprint

Step 1 - Decompose the use case into subtasks

Step 2 - Choose the architecture pattern

Step 3 - Prefer the simplest architecture that works

Step 4 - Define inputs, outputs, constraints, and success criteria