5.1 MLflow for LLMs & Model Logging
Key Takeaways
- GenAI chains are logged with the `mlflow.pyfunc` or `mlflow.langchain` flavor; code-based logging (models-from-code) avoids the serialization errors that break agents.
- Unity Catalog rejects any registered model without a model signature; infer one from an `input_example` or declare it with Python type hints.
- MLflow 3 registers models to the Unity Catalog Model Registry by default via the `databricks-uc` URI using three-level `catalog.schema.model` names.
- Model aliases such as Champion are mutable pointers, so promoting a new version means reassigning the alias with no downstream code change.
- `databricks.agents.deploy()` serves an agent plus tracing, the Review App, and monitoring in one step; `mlflow.langchain.autolog()` must be called explicitly on serverless compute.
Why MLflow Anchors the Deploy Domain
Every servable generative AI application on Databricks passes through MLflow before it reaches production. The Assembling and Deploying Applications domain is 22% of the exam, roughly 10 of 45 scored questions, and MLflow logging, signatures, and the Unity Catalog Model Registry are the connective tissue for all of it. MLflow gives you three things the exam tests repeatedly: tracking (parameters, prompts, traces, and metrics per run), models (a portable, servable package), and the registry (governed, versioned artifacts). A GenAI app that is not logged as an MLflow model cannot be deployed to Model Serving, so this section is the gateway to everything in 5.2 and 5.3.
Logging a Chain or Agent: Flavors and models-from-code
MLflow packages code into a model flavor. For generative AI you will almost always use one of two:
mlflow.langchainlogs a LangChain or LangChain Expression Language (LCEL) chain, or a LangGraph graph, directly.mlflow.pyfuncis the universal Python flavor; it wraps arbitrary pre-processing, an LLM call, and post-processing into one servable object. This is the correct answer when a question describes a custom chain with glue code around the model. Thespark,onnx, andtensorflowflavors are for classic machine-learning models and are wrong for GenAI chains.
The modern, recommended pattern is code-based logging, often called models-from-code. Instead of pickling a live Python object (which fails for agents that hold un-serializable clients or open network connections), you point MLflow at a Python file that builds and returns the model, and MLflow logs the source code itself. You mark the object to serve with mlflow.models.set_model(...) inside that file. Code-based logging is more reproducible, sidesteps serialization errors common with agents, and is the expected approach for the March 18, 2026 agent-focused blueprint.
The elements a servable RAG model must log
A frequent hard question asks what you must log together so a retrieval-augmented generation (RAG) app is both reproducible and servable. Missing any one breaks deployment or later debugging:
| Element | Why it is required |
|---|---|
Model flavor (pyfunc / langchain) | Tells serving how to load and invoke the app |
| Embedding model reference | Retriever must embed queries the same way it embedded chunks |
| Retriever configuration | Which Vector Search index, k, and metadata filters to use |
Dependencies (pip_requirements) | Rebuilds the exact environment on the endpoint |
| Input example | Seeds validation and auto-signature inference |
| Model signature | Concrete input and output schema (see below) |
Model Signatures
A model signature is the declared input and output schema of the model. Unity Catalog requires a signature and rejects models without one. You obtain a signature two ways: pass an input_example at log time and let MLflow infer the schema, or declare Python type hints on your prediction function. Type hints are recommended for GenAI because they handle chat messages, tool definitions, and nested structures while also enabling automatic input validation. For a chat model the signature typically describes a list of messages in and a completion out. On the exam, symptoms like 'the serving endpoint rejects my model' or 'registration fails immediately' frequently trace back to a missing or mismatched signature.
The Model Registry in Unity Catalog
Once logged, you register the model to make it a governed, versioned artifact. In MLflow 3 the default registry URI is databricks-uc, meaning the registry lives in Unity Catalog and models use a three-level name, catalog.schema.model. Register with mlflow.register_model() against that URI, or call mlflow.set_registry_uri('databricks-uc') first. Saving to DBFS or tagging a cluster does not provide governance, lineage, or versioning, a common distractor.
Unity Catalog registration replaces the old workspace-registry stage transitions (Staging and Production) with model aliases, which are mutable, named pointers such as Champion and Challenger. To promote a newly validated version without changing downstream code, you reassign the Champion alias to the new version; jobs and endpoints that reference the alias pick up the change automatically. This alias indirection is the exam answer to 'promote to production without editing the serving configuration.'
Agent Framework Deployment and Observability
For agents, Databricks layers extra tooling on top of MLflow:
ResponsesAgent: wrapping an agent in this interface standardizes its inputs and outputs so it interoperates with AI Playground, Agent Evaluation, and Databricks Apps.databricks.agents.deploy(): a single call that not only serves the registered agent but also wires up MLflow 3 tracing, the Review App, and monitoring and feedback, removing manual endpoint plumbing.mlflow.langchain.autolog(): enables automatic tracing of chain executions. Note the tested edge case: autologging is not automatically enabled on serverless compute, so you must call it explicitly.- MLflow Prompt Registry: versions prompts with aliases in Unity Catalog so you can promote a prompt from dev to staging to prod without touching application code. A Git branch or a DBFS text file does not provide governed prompt lifecycle management.
Common Traps
- Choosing the
sparkoronnxflavor for a custom GenAI chain, whenpyfunc(orlangchain) is correct. - Forgetting the signature, then blaming the endpoint when registration fails.
- Assuming pickling works for agents; prefer code-based models-from-code logging.
- Confusing MLflow (track, package, version) with Unity Catalog (govern, access, lineage): MLflow tracks and registers, while Unity Catalog governs the registry MLflow writes into.
Which MLflow model flavor is most appropriate for packaging a custom GenAI chain that wraps pre-processing, an LLM call, and post-processing into one servable object?
A team wants to promote a newly validated registered model version to production without changing any downstream code that references it. What should they update?
In MLflow 3 with Unity Catalog-enabled workflows, which registry URI is used by default for model registry operations?