4.4 Tools, Agents & Function Calling

Key Takeaways

  • A chain has fixed control flow decided at design time; an agent lets the model choose the next tool based on reasoning, so only graduate to an agent when a single retrieval pass cannot solve the task.
  • In tool (function) calling the model decides WHEN to call a tool while you define WHICH tools exist and validate the call schema; a bigger context window does not create live system access.
  • MLflow ResponsesAgent standardizes an agent so AI Playground, Agent Evaluation, and Databricks Apps can drive it; LangGraph adds branching, loops, and state a linear chain cannot express.
  • High-impact tools such as a refund API should require human confirmation, and offline agent evaluation must score tool choice, tool arguments, and final task success, not just wording.
  • Databricks managed MCP endpoints expose governed resources like a Vector Search index as agent tools, and Agent Bricks provides managed agent building blocks (deepened in the March 18, 2026 blueprint).
Last updated: July 2026

From Chains to Agents

The Application Development domain is the exam's heaviest block at 30%, and the single most-tested judgment inside it is knowing when a problem needs an agent rather than a simpler chain. A chain is a fixed sequence of calls whose control flow is decided at design time: retrieve context, build a prompt, call the model, return the answer. An agent lets the model decide which tool or step to call next based on its reasoning over intermediate results. The exam repeatedly frames this as a trade-off: agents add flexibility but also add latency, cost, and harder evaluation, so you should never graduate to an agent unless a single retrieval pass genuinely cannot solve the task.

DimensionRAG chainTool-calling agent
Control flowFixed, known at design timeDynamic, model-decided
Best forRetrieve-read-answer Q&AMulti-step, conditional, actions
LatencyLower, predictableHigher, variable
EvaluationAnswer qualityTool choice + args + final success
ExampleHandbook chatbot with citationsClaims app that also files a ticket

When an agent beats a static chain: the workflow needs multiple steps, tool calls, conditional branching, iterative retrieval, or the ability to take an action in an external system. If a FAQ assistant has a strict retrieve-read-answer flow under a tight latency SLA, a chain is preferable because it is more predictable and lower latency. But once the app must answer policy questions, create access tickets, and check ticket status in external systems, retrieval alone is not enough and an agent or hybrid design fits.

Function / Tool Calling Mechanics

Tool calling (also called function calling) is the bridge between the model and live data or systems. You define a set of tools, each with a name, a natural-language description, and a typed argument schema, and the model emits a structured request to invoke one when it decides the task requires it. Your runtime executes the function, returns the result, and the model conditions its next output on that observation. The key division of responsibility for the exam: the model decides WHEN to call a tool; you decide WHICH tools exist and you validate the call schema. A support copilot that must fetch a customer's live order status from an internal API needs tool calling, not a larger context window, which helps reasoning over text but creates no live access to operational systems.

ReAct-Style Reasoning

Most tool-using agents follow a ReAct ('Reason + Act') loop: the model produces a Thought, chooses an Action (a tool call), reads the Observation returned, and repeats until it has enough information to give a final answer. A research assistant that must dynamically decide whether to search documents, run a calculator, or query a database until it can answer is the canonical ReAct case. Because the loop is open-ended, you must bound it with stopping conditions: a maximum iteration count and clear termination logic, or the agent can spin. When an agent keeps calling the same search tool without making progress, the fix is rarely a bigger model; it is tighter tool schemas, clearer tool descriptions, and explicit stopping rules.

The Databricks Mosaic AI Agent Framework

Databricks provides platform components so you do not build agent plumbing from scratch:

  • MLflow ResponsesAgent is the standard interface for exposing an agent so Databricks tooling (AI Playground, Agent Evaluation, Databricks Apps) can drive it uniformly. Wrapping a custom agent in ResponsesAgent is what unlocks those downstream features; skipping it loses platform integration.
  • LangChain integration exposes Vector Search as a vector store; the typical pattern is DatabricksVectorSearch(...).as_retriever() to plug retrieval into chains and agents.
  • LangGraph models the agent as a graph of nodes and edges with explicit state, which you need for branching, loops, retries, or memory across turns. Reach for it when control flow is conditional or cyclic; a linear chain is simpler but less expressive.
  • Databricks SQL Agent is the right abstraction for natural-language, read-only Q&A over governed Unity Catalog tables, because it translates requests into governed SQL rather than retrieving documents.
  • MCP (Model Context Protocol) is a standard protocol for exposing tools, resources, and prompts to agents. Databricks can host a managed MCP endpoint so an agent reaches, say, a Vector Search index as a governed tool instead of through a bespoke connector.
  • Agent Bricks are managed building blocks that handle tool wiring, memory, and evaluation hooks; the March 18, 2026 blueprint deepened its coverage.

Persistent state / memory lets a multi-step agent remember intermediate facts across calls. Match the storage to the state's lifetime: in-memory for a single request, and a persistent datastore scoped to the conversation for per-session memory, so a confirmed address is remembered within one conversation but never leaks across users. Cross-session memory belongs in a Delta table or a key-value store.

Worked Example and Common Traps

Consider a claims operations app that must answer policy questions, retrieve supporting documents, and create a follow-up task in another system only when needed. This mixes knowledge retrieval with selective action, so a tool-calling agent with a retrieval tool and a workflow tool fits, while a plain RAG chain cannot decide when to call external systems. Two safety-flavored traps recur. First, a tool with real side effects, such as a refund API with financial impact, should require explicit confirmation or human-in-the-loop approval before it runs. Second, offline agent evaluation must capture tool choice, tool arguments, and final task success, not just final wording, because an agent can fail by picking the wrong tool or passing bad parameters even when the prose reads well. Together these rules explain the exam's bias: prefer the simplest architecture that meets the requirement, then add agentic power deliberately and with guardrails.

Test Your Knowledge

A claims operations app must answer policy questions from documents and also create a follow-up task in another system only when the situation requires it. Which architecture fits best?

A
B
C
D
Test Your Knowledge

In Databricks agent development, what does wrapping an agent with the MLflow ResponsesAgent interface accomplish?

A
B
C
D
Test Your Knowledge

An agent repeatedly calls the same search tool without making progress toward an answer. Which change is most likely to fix the behavior?

A
B
C
D