3.3 Databricks Vector Search

Key Takeaways

  • A Vector Search endpoint is the compute resource that hosts one or more indexes; the endpoint is separate from the index and its sizing affects latency and cost, not correctness.
  • Delta Sync indexes automatically track a Delta table but require Change Data Feed enabled on the source table; without it, index creation fails.
  • Direct Vector Access indexes require you to manage vectors and metadata yourself via the SDK or API, and fit external embeddings or non-Delta sources.
  • Managed-embeddings indexes accept `query_text`; self-managed indexes without an embedding model need `query_vector` for ANN, and hybrid search needs both.
  • Triggered sync suits infrequent updates and controls cost; continuous sync gives near-real-time freshness at higher ongoing cost.
Last updated: July 2026

Serving Semantic Retrieval on Databricks

Databricks Vector Search is the platform's managed service for storing embeddings and running similarity queries — the core retrieval component of most RAG apps on Databricks. When you need semantic retrieval over internal manuals, policies, or tickets, the object built for the job is a Vector Search index (not a feature table, an MLflow experiment, or a dashboard). This section covers how to create, configure, keep fresh, and query one.

Endpoints vs indexes

A Vector Search endpoint is the compute resource that hosts one or more indexes and serves similarity queries. It is deliberately separate from the index itself: you can host multiple indexes on one endpoint, and you size the endpoint independently from the data. Endpoint sizing affects latency and cost, not correctness — a smaller endpoint returns the same results, just potentially slower. The index is the data structure (vectors + metadata); the endpoint is the machinery that serves it.

Two index types

AspectDelta Sync indexDirect Vector Access index
SourceA Delta tableYou write vectors directly
FreshnessAuto-refreshes as the table changesYou manage updates yourself
ManagementConfigured on the source tableManaged via SDK/API
Best forData already in DeltaExternal embeddings, non-Delta sources

A Delta Sync index automatically follows a source Delta table and refreshes as that table changes — the default when your source is Delta. It has one prerequisite the exam loves: the source table must have Change Data Feed (CDF) enabled, because a standard Delta Sync index reads the table's change stream to apply incremental updates. If CDF is not enabled, index creation fails immediately — not because the table was un-vacuumed, un-Z-ordered, or on the wrong warehouse type. Enable CDF on the source table first.

A Direct Vector Access index skips syncing: you read and write vectors and metadata yourself through the SDK or API. It is the right choice when embeddings come from outside Databricks or the source is not a Delta table. It does support metadata; what it does not do is auto-sync from Delta.

Managed vs self-managed embeddings

An index can compute embeddings for you or expect precomputed ones:

  • Managed embeddings — you attach an embedding model, and Databricks embeds a text column at index time and embeds queries from text at query time. In the common text-query path you send query_text and Databricks computes the vector.
  • Self-managed embeddings — you supply the precomputed embedding vector column yourself, and no embedding model is attached. For an approximate-nearest-neighbor (ANN) similarity search on Databricks Runtime 15.3+, you must supply query_vector directly, because Databricks cannot turn text into a vector without a model. For hybrid search on such an index you supply both query_text (keyword side) and query_vector (similarity side).

One dimension caveat: the embedding column and the query vector must have matching dimensions, or queries fail.

Sync modes: freshness vs cost

Delta Sync indexes offer sync modes that trade freshness against cost. Continuous sync keeps a pipeline provisioned for near-real-time updates — great when the source table changes throughout the day and the index must stay current, but it costs more because the syncing pipeline runs continuously. Triggered sync updates on demand and is the better fit when documents change infrequently (say once per week) and you want to control cost rather than pay for seconds-level freshness. Match the mode to the actual update cadence.

Retrieval: ANN, hybrid, and filters

By default Vector Search runs ANN semantic retrieval — embed the query, return the nearest chunk vectors. But pure semantic search can miss literal tokens. When users search with exact error codes (ERR_CONN_RESET_42), product SKUs, or proper nouns alongside natural-language descriptions, use hybrid search, which combines vector similarity with keyword matching so both signals count. Keeping relevant text in the index metadata improves the keyword component. Metadata filters further restrict retrieval to rows matching conditions such as source, language, or date — use them when the question implies a scope (for example, 2026 policies only) so out-of-scope chunks never reach the model.

Keeping the index fresh and fast

Two production issues recur. First, freshness: if the corpus updates and you use Delta Sync with CDF, updates flow automatically; with Direct Vector Access you must push changes yourself. Second, cold starts: a managed-embeddings setup can intermittently time out on the first request after idle periods because the embedding endpoint scaled to zero. Keeping the embedding endpoint warm (or precomputing embeddings so no query-time embedding is needed) is the standard fix — the problem is not short chunks, too many metadata columns, or a low retriever k. Finally, evaluate retrieval on its own with metrics like hit rate@k (did any relevant chunk appear in the top k) so you can improve retrieval independently of generation.

Putting it together

A typical Databricks retrieval flow reads like this: land documents in a Unity Catalog volume, parse and chunk them into a Delta table, enable Change Data Feed, and build a Delta Sync index with managed embeddings on a Vector Search endpoint so new rows flow in automatically. At query time you send query_text, optionally add a metadata filter for scope, and receive the top-k chunks to inject into the prompt. If latency is critical you flip to precomputed self-managed embeddings and pass query_vector directly; if literal codes matter you switch the call to hybrid search. Choosing correctly among these options — index type, embedding management, sync mode, and query mode — is precisely what the Data Preparation domain rewards.

Test Your Knowledge

You try to create a standard Delta Sync Vector Search index from a Delta table and creation fails immediately. Which missing prerequisite is the most likely cause?

A
B
C
D
Test Your Knowledge

Support engineers query with both natural-language symptoms and exact error codes such as ERR_CONN_RESET_42. Which retrieval mode is most appropriate?

A
B
C
D
Test Your Knowledge

Source documents change only once per week and the team wants to control cost rather than pay for near-real-time updates. Which Delta Sync mode fits best?

A
B
C
D