2.1 Model Deployments and Playgrounds
Key Takeaways
- Microsoft Foundry organizes AI app work around projects, models, agents, tools, evaluations, monitoring, and enterprise controls.
- The model catalog is for choosing models by capability, modality, cost, latency, region, and deployment option; it is not the same thing as a callable endpoint.
- A model deployment gives a selected model a deployment name and configuration so application code can send inference requests to it.
- Deployments can include model version, capacity or provisioning type, content filtering configuration, and rate limiting details, depending on the model.
- The model playground is the fastest place to test prompts, parameters, safety behavior, model comparisons, and generated code before building an app.
The Foundry Build Flow
Microsoft Foundry is the Azure platform surface for building AI apps with models, agents, tools, evaluations, tracing, monitoring, and governance in one place. For AI-901, the important pattern is not memorizing every portal blade. It is recognizing the order of work: organize the solution in a project, pick the right model, make that model available for inference, test behavior, and then integrate it into an application.
A Foundry project keeps related assets together: model deployments, data connections, prompts, indexes, agents, evaluations, and monitoring artifacts. A Foundry resource is the Azure resource layer that supplies the model and tool capabilities. Microsoft documentation has active new and classic portal experiences, so exam questions should be read for concepts rather than exact menu names.
Catalog, Deployment, Endpoint, Playground
| Foundry item | What it means for AI-901 | Do not confuse it with |
|---|---|---|
| Model catalog | The place to discover and compare models by provider, task, modality, region, price, and deployment type | A running endpoint |
| Model deployment | A named configuration that makes a selected model callable for inference | The model card itself |
| Inference endpoint | The API surface your app calls, often using the deployment name in the model parameter | The browser playground |
| Model playground | A controlled test space for prompts, parameters, tools, safety, comparison, and code export | Production monitoring |
| Instant model | A preview shortcut that lets supported models be called by name without a deployment | A replacement for all production deployments |
The local AI-901 cheat sheet compresses this as Catalog | Deploy | Test | Code. That is a strong exam memory aid. The catalog helps you choose. The deployment lets you use. The playground helps you test. The SDK or API lets your application call the model.
What A Deployment Adds
A deployment gives a model a stable name and configuration. Microsoft documents deployment details such as model name, model version, capacity or provisioning type, content filtering configuration, and rate limiting configuration. The exact fields depend on the model and deployment type.
This distinction matters in scenarios. A team might browse a capable multimodal model in the catalog, but the app still cannot call it until the model is deployed or otherwise available through an approved access pattern. A Foundry resource can host many deployments, and billing is tied to inference performed on those deployments, not to merely reading a catalog entry.
Deployment choice is also where production constraints show up. If the app needs a specific region for data residency, reserved throughput, custom content filters, custom guardrails, endpoint-specific configuration, quota partitioning, or a fine-tuned model, a named deployment is usually the safer answer than a preview instant-model shortcut.
How To Choose A Model
Use the scenario, not the model name alone:
- Identify the modality. Text chat, embeddings, image input, image generation, speech, and multimodal reasoning point to different model families or Foundry Tools.
- Match capability to risk. Simple extraction may not need the largest reasoning model. Complex multi-step analysis may need a stronger model even if it costs more.
- Check constraints. Region, latency, quota, data handling, content filters, and available deployment types can eliminate otherwise attractive choices.
- Prefer grounding over guessing. If the answer depends on private or changing facts, plan for retrieval-augmented generation instead of relying only on the model's training data.
- Prototype before coding. Use the playground to tune prompts, parameters, and safety before committing to SDK code.
Why The Playground Matters
The model playground is where a candidate can see the practical effect of prompt wording, system messages, temperature, maximum output length, and tools such as web search, file search, or code interpreter where available. It can also compare models side by side under synchronized inputs, which helps with price-to-performance decisions.
For AI-901, treat the playground as the bridge between concept and implementation. If a question says the team wants to test tone, response format, grounding, or safety behavior before writing code, the playground is the natural choice. If the question says a production app must call the model repeatedly, the answer shifts toward deployment names, endpoints, authentication, and SDK/API integration.
A developer has reviewed a model card in the Foundry catalog and decided it fits a customer-support prototype, but the app has no stable model name to use in requests and no inference configuration has been created. What should the developer do next for a deployment-based build?
A team is prototyping with instant models, but the production version must pin behavior, use a custom content filter, and keep traffic in a specific supported region. Which approach best fits those requirements?