1.2 The Databricks GenAI Stack

Key Takeaways

  • Mosaic AI is the umbrella product; its exam-critical pieces are the Agent Framework, Agent Evaluation, Model Serving, Vector Search, and AI Gateway.
  • Unity Catalog is the single governance layer for source tables, volumes, vector indexes, functions, and registered models — it maps to the 8% Governance domain.
  • MLflow provides LLMOps: experiment tracking, the LangChain/pyfunc model flavors, the Model Registry, MLflow Tracing, and mlflow.evaluate for offline scoring.
  • Foundation Model APIs serve hosted base LLMs and embedding models pay-per-token (or via provisioned throughput) so you can build without provisioning GPUs.
  • A Databricks RAG reference architecture flows: Unity Catalog volumes to chunking to embeddings (FM API) to Vector Search index to retriever to prompt to Model Serving to response, with MLflow tracing feeding Agent Evaluation and Monitoring.
Last updated: July 2026

The Databricks GenAI Stack at a Glance

Every scenario on this exam expects platform-aware decisions, not generic LLM knowledge. You must know which Databricks component solves which problem and how they compose. The GenAI stack has four layers: Mosaic AI application services, Unity Catalog governance, MLflow operations, and Foundation Model APIs for hosted models.

Mosaic AI — The Application Layer

Mosaic AI is the umbrella brand for Databricks' GenAI tooling. The pieces you must know cold:

  • Mosaic AI Vector Search — the managed vector database. You create a vector index (typically synced from a Delta table via Delta Sync), and the app issues similarity queries to retrieve the most relevant chunks. This is the retrieval engine of any RAG app.
  • Mosaic AI Model Serving — deploys models behind low-latency REST endpoints. It serves your custom pyfunc/agent models, external models, and Foundation Models, and auto-scales with traffic. This is how a RAG chain or agent becomes a production API.
  • Mosaic AI Agent Framework — libraries and patterns for building tool-calling agents and RAG chains, authoring them as MLflow models, and logging traces for every step. It underpins the 30% Application Development domain.
  • Mosaic AI Agent Evaluation — built-in LLM-judge scorers (correctness, groundedness, relevance, safety) plus support for custom scorers, used to evaluate agents offline against a labeled set and online in production.
  • AI Gateway — a governance and observability proxy in front of model endpoints that adds usage tracking, rate limits, payload logging, and guardrails. It is central to the March 2026 objectives.
  • AI Playground and Agent Bricks — an interactive prompt-testing surface and a low-code, managed way to build agents, respectively.

Unity Catalog — The Governance Layer

Unity Catalog (UC) is the single governance plane across the lakehouse. On this exam UC governs far more than tables: it secures volumes (raw source files), Delta tables, vector indexes, UC functions (used as agent tools), and registered models — all with one permission model, lineage, and audit logging. When a question asks how to centrally control access to source data, embeddings, and deployed models, the answer is Unity Catalog. This is the heart of the 8% Governance domain, and it also enables safe data preparation.

MLflow — The Operations (LLMOps) Layer

MLflow is how GenAI apps are packaged, versioned, tracked, and evaluated:

  • Experiment tracking logs prompts, parameters, and metrics across iterations.
  • Model flavors such as the LangChain flavor and the generic pyfunc flavor package a chain or agent as a deployable artifact.
  • The Model Registry in Unity Catalog versions models and manages promotion to production.
  • MLflow Tracing captures each step of a chain or agent call for debugging and monitoring.
  • mlflow.evaluate runs offline evaluation with metrics and LLM judges.

The recurring exam contrast is MLflow tracks and versions; Model Serving deploys; Unity Catalog governs — keep those verbs straight.

Foundation Model APIs — Hosted Models

Foundation Model APIs (FM APIs) give you hosted base LLMs and embedding models without provisioning GPUs. Pay-per-token endpoints are ideal for quick iteration and variable load; provisioned throughput endpoints give reserved capacity and predictable latency for production. FM APIs are where most model-selection and embedding-model decisions in Application Development happen.

Mapping the Stack to the Six Domains

Domain (weight)Primary Databricks components
Design Applications (14%)AI Playground, FM APIs, prompt patterns
Data Preparation (14%)UC volumes, Delta Lake, chunking, embeddings via FM APIs
Application Development (30%)Agent Framework, Vector Search, FM APIs, LangChain/LangGraph
Assembling & Deploying (22%)MLflow (pyfunc/registry), Model Serving, Vector Search config
Governance (8%)Unity Catalog, AI Gateway, masking, guardrails
Evaluation & Monitoring (12%)Agent Evaluation, MLflow Tracing, inference tables, AI Gateway

Managed vs. Custom: Agent Bricks and Model Serving Endpoint Types

The exam often asks you to trade managed convenience against custom control. Agent Bricks is a low-code, Databricks-hosted way to stand up common agents quickly; a hand-built Agent Framework app in Python (with LangChain or LangGraph) gives full control over tools, orchestration, and logic. Choose Bricks when the pattern is standard and speed matters; choose the framework when you need bespoke behavior. Similarly, Model Serving offers pay-per-token Foundation Model endpoints for spiky, low-volume workloads and provisioned throughput for steady, latency-sensitive production traffic. Matching the endpoint type to the workload is a classic Assembling & Deploying question.

A RAG Reference Architecture on Databricks

The canonical flow tested repeatedly is: raw documents land in UC volumes; a job parses and chunks them; an embedding model (FM API) turns chunks into vectors; the vectors sync into a Vector Search index via Delta Sync; at query time a retriever fetches the top-k chunks; those are injected into a prompt template; a Model Serving endpoint (or FM API LLM) generates a grounded, cited answer; and MLflow Tracing plus inference tables feed Agent Evaluation and Monitoring. Unity Catalog governs every asset along the way. Internalize this pipeline — nearly every high-weight question is a variation on one of its stages, and the diagram below is worth memorizing before test day.

How to Read a Stack Question

When a scenario names a symptom, translate it to a stage. “Answers cite stale facts” points to the retrieval/indexing stage (re-sync the Vector Search index or fix chunking). “The endpoint is too slow under load” points to Model Serving (switch to provisioned throughput or a smaller model). “We cannot see which users called which model” points to AI Gateway usage tracking. “Only some analysts should query the source table” points to Unity Catalog permissions. Mapping symptom to component is the fastest route to the right answer on this exam.

Loading diagram...
Databricks RAG Reference Architecture
Test Your Knowledge

Which Databricks component provides the managed semantic index that a RAG application queries to retrieve the most relevant document chunks at query time?

A
B
C
D
Test Your Knowledge

A team needs one permission and lineage layer that governs source volumes, Delta tables, vector indexes, functions used as agent tools, and registered models. Which Databricks service is designed for this?

A
B
C
D
Test Your Knowledge

Which Databricks capability lets you call hosted base LLMs and embedding models on a pay-per-token basis without provisioning your own GPU infrastructure, and drives most model-selection decisions in Application Development?

A
B
C
D