Which Databricks component provides the managed semantic index that a RAG application queries to retrieve the most relevant document chunks at query time?

Mosaic AI Vector Search. Mosaic AI Vector Search is the managed vector database that stores embeddings and serves similarity queries to fetch relevant chunks. The Model Registry versions models, Unity Catalog governs assets, and AI Gateway proxies and meters endpoints.

A team needs one permission and lineage layer that governs source volumes, Delta tables, vector indexes, functions used as agent tools, and registered models. Which Databricks service is designed for this?

Unity Catalog. Unity Catalog is the single governance plane across the lakehouse, securing volumes, tables, vector indexes, functions, and registered models with unified permissions, lineage, and audit logging. The other services track, serve, or host models rather than govern access.

Which Databricks capability lets you call hosted base LLMs and embedding models on a pay-per-token basis without provisioning your own GPU infrastructure, and drives most model-selection decisions in Application Development?

Foundation Model APIs. Foundation Model APIs serve hosted LLMs and embedding models (pay-per-token or provisioned throughput) so engineers can build without managing GPUs. Delta Live Tables is a data-pipeline tool, UC volumes store files, and MLflow Recipes is unrelated to hosted inference.

The Databricks GenAI Stack — Free Study Guide 2026

The Databricks GenAI Stack at a Glance

Every scenario on this exam expects platform-aware decisions, not generic LLM knowledge. You must know which Databricks component solves which problem and how they compose. The GenAI stack has four layers: Mosaic AI application services, Unity Catalog governance, MLflow operations, and Foundation Model APIs for hosted models.

Mosaic AI — The Application Layer

Mosaic AI is the umbrella brand for Databricks' GenAI tooling. The pieces you must know cold:

Mosaic AI Vector Search — the managed vector database. You create a vector index (typically synced from a Delta table via Delta Sync), and the app issues similarity queries to retrieve the most relevant chunks. This is the retrieval engine of any RAG app.
Mosaic AI Model Serving — deploys models behind low-latency REST endpoints. It serves your custom pyfunc/agent models, external models, and Foundation Models, and auto-scales with traffic. This is how a RAG chain or agent becomes a production API.
Mosaic AI Agent Framework — libraries and patterns for building tool-calling agents and RAG chains, authoring them as MLflow models, and logging traces for every step. It underpins the 30% Application Development domain.
Mosaic AI Agent Evaluation — built-in LLM-judge scorers (correctness, groundedness, relevance, safety) plus support for custom scorers, used to evaluate agents offline against a labeled set and online in production.
AI Gateway — a governance and observability proxy in front of model endpoints that adds usage tracking, rate limits, payload logging, and guardrails. It is central to the March 2026 objectives.
AI Playground and Agent Bricks — an interactive prompt-testing surface and a low-code, managed way to build agents, respectively.

Unity Catalog — The Governance Layer

Unity Catalog (UC) is the single governance plane across the lakehouse. On this exam UC governs far more than tables: it secures volumes (raw source files), Delta tables, vector indexes, UC functions (used as agent tools), and registered models — all with one permission model, lineage, and audit logging. When a question asks how to centrally control access to source data, embeddings, and deployed models, the answer is Unity Catalog. This is the heart of the 8% Governance domain, and it also enables safe data preparation.

MLflow — The Operations (LLMOps) Layer

MLflow is how GenAI apps are packaged, versioned, tracked, and evaluated:

Experiment tracking logs prompts, parameters, and metrics across iterations.
Model flavors such as the LangChain flavor and the generic pyfunc flavor package a chain or agent as a deployable artifact.
The Model Registry in Unity Catalog versions models and manages promotion to production.
MLflow Tracing captures each step of a chain or agent call for debugging and monitoring.
mlflow.evaluate runs offline evaluation with metrics and LLM judges.

The recurring exam contrast is MLflow tracks and versions; Model Serving deploys; Unity Catalog governs — keep those verbs straight.

Foundation Model APIs — Hosted Models

Foundation Model APIs (FM APIs) give you hosted base LLMs and embedding models without provisioning GPUs. Pay-per-token endpoints are ideal for quick iteration and variable load; provisioned throughput endpoints give reserved capacity and predictable latency for production. FM APIs are where most model-selection and embedding-model decisions in Application Development happen.

Mapping the Stack to the Six Domains

Domain (weight)	Primary Databricks components
Design Applications (14%)	AI Playground, FM APIs, prompt patterns
Data Preparation (14%)	UC volumes, Delta Lake, chunking, embeddings via FM APIs
Application Development (30%)	Agent Framework, Vector Search, FM APIs, LangChain/LangGraph
Assembling & Deploying (22%)	MLflow (pyfunc/registry), Model Serving, Vector Search config
Governance (8%)	Unity Catalog, AI Gateway, masking, guardrails
Evaluation & Monitoring (12%)	Agent Evaluation, MLflow Tracing, inference tables, AI Gateway

Managed vs. Custom: Agent Bricks and Model Serving Endpoint Types

The exam often asks you to trade managed convenience against custom control. Agent Bricks is a low-code, Databricks-hosted way to stand up common agents quickly; a hand-built Agent Framework app in Python (with LangChain or LangGraph) gives full control over tools, orchestration, and logic. Choose Bricks when the pattern is standard and speed matters; choose the framework when you need bespoke behavior. Similarly, Model Serving offers pay-per-token Foundation Model endpoints for spiky, low-volume workloads and provisioned throughput for steady, latency-sensitive production traffic. Matching the endpoint type to the workload is a classic Assembling & Deploying question.

A RAG Reference Architecture on Databricks

The canonical flow tested repeatedly is: raw documents land in UC volumes; a job parses and chunks them; an embedding model (FM API) turns chunks into vectors; the vectors sync into a Vector Search index via Delta Sync; at query time a retriever fetches the top-k chunks; those are injected into a prompt template; a Model Serving endpoint (or FM API LLM) generates a grounded, cited answer; and MLflow Tracing plus inference tables feed Agent Evaluation and Monitoring. Unity Catalog governs every asset along the way. Internalize this pipeline — nearly every high-weight question is a variation on one of its stages, and the diagram below is worth memorizing before test day.

How to Read a Stack Question

When a scenario names a symptom, translate it to a stage. “Answers cite stale facts” points to the retrieval/indexing stage (re-sync the Vector Search index or fix chunking). “The endpoint is too slow under load” points to Model Serving (switch to provisioned throughput or a smaller model). “We cannot see which users called which model” points to AI Gateway usage tracking. “Only some analysts should query the source table” points to Unity Catalog permissions. Mapping symptom to component is the fastest route to the right answer on this exam.

Databricks Generative AI Engineer Associate Certification

Databricks Generative AI Engineer Associate

1.2 The Databricks GenAI Stack

Key Takeaways

The Databricks GenAI Stack at a Glance

Mosaic AI — The Application Layer

Unity Catalog — The Governance Layer

MLflow — The Operations (LLMOps) Layer

Foundation Model APIs — Hosted Models

Mapping the Stack to the Six Domains

Managed vs. Custom: Agent Bricks and Model Serving Endpoint Types

A RAG Reference Architecture on Databricks

How to Read a Stack Question

Databricks Generative AI Engineer Associate Certification

1Introduction & Exam Strategy

2Design Applications

3Data Preparation

4Application Development

5Assembling & Deploying Applications

6Governance, Evaluation & Monitoring

Databricks Generative AI Engineer Associate

1.2 The Databricks GenAI Stack

Key Takeaways

The Databricks GenAI Stack at a Glance

Mosaic AI — The Application Layer

Unity Catalog — The Governance Layer

MLflow — The Operations (LLMOps) Layer

Foundation Model APIs — Hosted Models

Mapping the Stack to the Six Domains

Managed vs. Custom: Agent Bricks and Model Serving Endpoint Types

A RAG Reference Architecture on Databricks

How to Read a Stack Question