4.2 LangChain & LlamaIndex on Databricks

Key Takeaways

  • Wrap a Vector Search index with DatabricksVectorSearch and call .as_retriever() to get a LangChain retriever that plugs into chains and agents.
  • ChatDatabricks is the LangChain class that calls a Databricks-hosted chat model exposed through Foundation Model APIs or a serving endpoint.
  • On serverless compute LangChain autologging is not automatic; call mlflow.langchain.autolog() to capture traces in MLflow Tracing.
  • Use a linear LangChain chain for fixed steps; reach for LangGraph when the flow branches, loops, or keeps state across turns.
  • Databricks' Mosaic AI Agent Framework is the primary production path; LangChain and LlamaIndex are supported open-source options that wrap the same Vector Search and Foundation Model APIs.
Last updated: July 2026

Why Frameworks Sit on Top of Databricks Primitives

On Databricks the durable building blocks are Mosaic AI Vector Search, the Foundation Model APIs, Model Serving, MLflow, and Unity Catalog. LangChain, LangGraph, and LlamaIndex are open-source orchestration frameworks that wrap those primitives so you can compose retrieval, prompts, models, and tools with portable code. The exam expects you to know the integration classes and, more importantly, to choose the right framework for the workflow rather than defaulting to one for every problem. A useful mental model: frameworks give you portability of chain code, but they also hide platform-specific behavior behind abstractions, so you still need to understand the underlying Databricks components.

The databricks-langchain Integration

The databricks-langchain package provides first-class integration classes. The three you must recognize:

ClassPurpose
DatabricksVectorSearchExposes a Vector Search index as a LangChain vector store; call .as_retriever() to get a retriever
ChatDatabricksCalls a Databricks-hosted chat model served through Foundation Model APIs or a custom serving endpoint
DatabricksEmbeddingsWraps a Databricks embedding-model endpoint for query and document embedding

The standard retrieval pattern is: wrap the index with DatabricksVectorSearch, call .as_retriever(), and drop the resulting retriever into a chain or agent. You do not log the index to the MLflow Model Registry first, convert it to a Delta table you query only with SQL, or run mlflow.evaluate() on the index to turn it into a retriever — those are distractor options the exam likes. For the generation step, ChatDatabricks is the chat-model class; siblings like PandasDataFrameLoader, SQLDatabase, or CharacterTextSplitter serve unrelated purposes (loading data, querying SQL, splitting text) and are wrong answers when the question asks how to call a Databricks-hosted chat model.

Tracing and Observability

MLflow captures LangChain traces so you can inspect each step of a chain or agent run, including which tools were called and what they returned. A specific gotcha the exam tests: on serverless compute, LangChain autologging is not enabled automatically — you must explicitly call mlflow.langchain.autolog() to record chain execution in MLflow Tracing. Enabling Change Data Feed, creating a Direct Vector Access index, or wrapping the chain in a SQL function does nothing for tracing; only the autolog call does.

Chains vs LangGraph: Choosing Control Flow

A chain is a fixed sequence of calls whose control flow is known at design time — retrieve, build prompt, call model, return. It is easy to reason about, test, and optimize, and it is the right choice for a straightforward RAG flow. LangGraph models the workflow as a graph of nodes and edges with explicit state passed between them; reach for it when the flow branches, loops, or keeps memory across turns — for example, an agent that plans, calls tools, validates, and revisits earlier steps until it has enough information. Plain LangChain chains are linear, so conditional or cyclic control flow is exactly the signal to graduate to a graph. The tradeoff is more code and harder debugging, so do not adopt LangGraph for a workflow that a linear chain already handles.

LlamaIndex and When to Use Each Framework

LlamaIndex is a data framework centered on ingestion, indexing, and query. Its model is documents to nodes to an index, then a query engine or retriever over that index, with node post-processors and response synthesizers for advanced retrieval. On Databricks it integrates through a Databricks Vector Search vector store and can call Databricks-served models, so it competes most directly at the retrieval and data-connector layer.

Choosing between them:

  • LangChain — general-purpose orchestration, tool-calling agents, broad ecosystem of integrations; the default when you need chains plus agentic behavior.
  • LangGraph — stateful, branching, or cyclic agent control flow that a linear chain cannot express.
  • LlamaIndex — retrieval- and indexing-heavy applications where advanced query engines and data connectors are the focus.

Critically, none of these is Databricks' primary production recommendation. The Mosaic AI Agent Framework (with MLflow for tracing, evaluation, and deployment) is the primary path for building agentic systems on the platform, and it interoperates with LangChain- or LlamaIndex-authored components. A common exam judgment is that framework choice is a portability and developer-experience decision layered on top of Databricks-native Vector Search, Foundation Model APIs, and the Agent Framework — not a replacement for them. Wrapping a finished agent with MLflow's ResponsesAgent interface is what lets Databricks tooling (playground, evaluation, deployment) drive it uniformly regardless of which framework you authored it in, so keep that lifecycle step in mind even when you build with open-source frameworks.

A Minimal LangChain RAG Chain, Wired End to End

Seeing the pieces connect clarifies the exam scenarios. A minimal RAG chain on Databricks stitches four objects together: build a retriever from DatabricksVectorSearch(...).as_retriever(); define a ChatPromptTemplate that places the retrieved context near the question with delimiters and a grounding instruction; instantiate ChatDatabricks(endpoint=...) pointed at a Foundation Model API or serving endpoint; and finish with a StrOutputParser. Composed with LangChain Expression Language, the flow reads {context: retriever, question: passthrough} | prompt | ChatDatabricks | parser. Each object is independently swappable — change the embedding endpoint, the top-k, the prompt wording, or the served model without rewriting the others. Call mlflow.langchain.autolog() first on serverless so every run is traced.

LlamaIndex Specifics and a When-to-Use Edge Case

LlamaIndex shines when ingestion and advanced retrieval dominate: its data connectors, node parsers, and query engines (with node post-processors for reranking and metadata filtering, and response synthesizers for multi-chunk answers) give fine-grained control over the retrieval layer. A practical edge case: a team that needs sophisticated document indexing and query engines but only simple, single-pass generation often finds LlamaIndex's retrieval abstractions a better fit, then serves the answer through a Databricks-hosted model. Conversely, a team building a tool-calling agent with branching logic leans on LangChain plus LangGraph. Many production stacks mix them — LlamaIndex or Databricks-native retrieval underneath, a LangChain or Agent Framework orchestration layer on top — because the frameworks interoperate over the same Vector Search index and Foundation Model endpoints. The exam's recurring judgment is to match the framework to the workload, never to assume one framework is universally correct.

Test Your Knowledge

In LangChain on Databricks, what is the typical way to turn a Vector Search index into a retriever?

A
B
C
D
Test Your Knowledge

An application needs conditional branching and loops between planning, tool use, and validation steps. Which orchestration approach best matches that design?

A
B
C
D
Test Your Knowledge

On serverless compute, what must you do to capture automatic LangChain traces in MLflow?

A
B
C
D