Foundry Hubs, Projects, Models, and Deployments

Key Takeaways

Microsoft Foundry is the Azure platform layer for building, optimizing, deploying, tracing, evaluating, and governing AI apps and agents.
A hub or Foundry resource provides shared governance and infrastructure, while a project organizes workload-specific assets such as deployments, indexes, datasets, evaluations, files, and connections.
The model catalog is for discovery and selection; a deployment is the named callable endpoint that an app or agent actually invokes.
Standard Foundry resource deployments are the preferred path when supported, while serverless API and managed compute deployments serve different model, isolation, and operational needs.
AI-103 scenarios often test tradeoffs: model capability versus latency and cost, project isolation versus shared governance, and standard capacity versus provisioned throughput or managed compute.

Last updated: June 2026

Foundry Hubs, Projects, Models, and Deployments

Microsoft Foundry is the platform environment for AI apps and agents on Azure. Microsoft describes it as a unified platform-as-a-service for enterprise AI operations, model builders, and application development. For AI-103, that means Foundry is not just a playground: it is where teams organize projects, choose models, configure deployments, connect data and tools, run evaluations, trace behavior, and govern access.

The terminology can look confusing because Microsoft documentation includes both newer Foundry projects and classic hub-based projects. The exam-safe mental model is simple: shared governance and infrastructure sit above workload workspaces, and projects are where teams build the actual app or agent.

Concept	What it owns	Exam clue
Foundry resource	Administrative, security, monitoring, networking, and policy boundary	The scenario asks for centralized governance or RBAC across projects
Hub or hub-based project	Shared settings such as data connections, compute, network configuration, and classic advanced features	The prompt mentions hubs, prompt flow, managed compute, or Azure Machine Learning compatibility
Project	Workload workspace for app or agent assets	The team needs deployments, indexes, datasets, evaluations, files, and connections scoped to one solution
Connection	Reusable reference to another service and its authentication method	The app should use Azure OpenAI, AI Search, Storage, APIs, or tools without embedding credentials

Model Catalog to Deployment

The model catalog is the discovery and selection surface. Use it to compare task fit: large language models for broad generation, small language models for low-cost narrow tasks, reasoning models for complex multi-step work, embedding models for retrieval, code models for developer scenarios, and multimodal models when text alone is not enough.

A deployment is different from a model. The model is the artifact or family; the deployment is the named endpoint configuration the application calls. A deployment can have its own model version, throughput choice, content filter, region, quota, and operational settings. Multiple deployments can expose the same model for different workloads, such as gpt-4o-mini-chat-dev, gpt-4o-prod, and embedding-rag-indexer.

Deployment choice	Best fit	Tradeoff to notice
Standard deployment in Foundry resources	Common Azure OpenAI and Foundry Models workloads	Fastest managed path, but subject to shared regional capacity and quota
Provisioned throughput	High-volume production workloads that need predictable throughput and latency	Higher planning commitment; capacity must be sized and monitored
Serverless API deployment	Certain catalog models where managed endpoint access is enough	Availability and billing depend on model/provider support
Managed compute deployment	Open-source, custom, or isolated model hosting with dedicated managed VMs	More infrastructure choices and cost responsibility

Planning Pattern

Start with the business requirement, then work inward:

Choose the project boundary: team, workload, environment, and data sensitivity.
Choose the model family by task, modality, context size, latency, cost, and safety profile.
Choose the deployment type by capacity, isolation, model availability, and network requirements.
Create project connections for retrieval indexes, storage, APIs, and monitoring.
Test with evaluations before routing user traffic.

A strong AI-103 answer usually avoids two extremes. It does not put every experiment into one shared project with broad keys, and it does not overbuild managed GPU infrastructure when a standard Foundry resource deployment satisfies the requirement. The right design is the smallest governed workspace and deployment plan that can meet the app's behavior, cost, security, and operations targets.

Capacity, Quota, and Throughput

Deployments do not get unlimited capacity. Standard deployments are governed by tokens-per-minute (TPM) quota and requests-per-minute (RPM) limits assigned per region and subscription. When a workload throttles (HTTP 429 responses), you have three levers: raise quota in that region, move load to another region or deployment, or switch to Provisioned Throughput Units (PTUs) for reserved, predictable capacity. PTUs are the exam-correct answer when a scenario demands stable latency for sustained high volume; standard pay-as-you-go is correct when traffic is spiky or low.

Symptom in scenario	Root cause	Best response
Intermittent 429 errors at peak	TPM/RPM quota exceeded	Request quota increase or add a second regional deployment
Latency must be predictable under load	Shared standard capacity	Move to provisioned throughput (PTUs)
Cost spikes from a single test app	No per-deployment isolation	Separate dev and prod deployments, set quota per deployment
Model unavailable in chosen region	Regional model availability	Pick a supported region or a serverless API model

Worked Example: Choosing a Deployment

A retailer needs a customer-facing chat assistant handling roughly 2 million tokens per hour with a strict 2-second response target, plus a nightly batch that summarizes support tickets. The exam-correct split is two deployments: a provisioned throughput gpt-4o deployment for the latency-sensitive chat path, and a cheaper standard small-language-model deployment for the tolerant nightly batch. A single standard deployment for both would risk daytime throttling; a single PTU deployment for both would overpay for the batch. Capacity planning is a design judgment, not a one-size choice.

Connections and CI/CD

Connections let a project reach Azure OpenAI, AI Search, Storage, Key Vault, or external APIs and tools through a stored, reusable reference, ideally authenticated with Entra ID or managed identity rather than a pasted key. For production maturity, the blueprint expects you to wire Foundry projects into CI/CD pipelines: infrastructure-as-code defines the project and deployments, and promotion from dev to prod runs evaluations as a gate before traffic is routed. Treat "deploy then evaluate then promote" as the operational pattern, never "deploy straight to users."

Test Your Knowledge

A team selected a model from the Foundry model catalog, but their app still cannot call it. What is the missing step?

Rename the Azure subscription so it matches the model family

Move all project files into the global scope

Create a named deployment for the model and call that deployment from the app or agent

Disable RBAC because catalog models do not use access control

Test Your Knowledge

A customer-facing chat app must hold a strict, predictable latency target while sustaining millions of tokens per hour. The team keeps hitting HTTP 429 throttling on a standard deployment. What is the best fix?

Switch the workload to a provisioned throughput (PTU) deployment for reserved, predictable capacity

Lower the model temperature to reduce token usage

Disable content filters so requests complete faster

Move the entire project into a single shared hub

Up Next

Security, Networking, Monitoring, and Responsible AI

Continue learning

Microsoft Azure AI Apps and Agents Developer Associate

Microsoft Azure AI App and Agent Developer (AI-103)

Foundry Hubs, Projects, Models, and Deployments

Key Takeaways

Foundry Hubs, Projects, Models, and Deployments

Model Catalog to Deployment

Planning Pattern

Capacity, Quota, and Throughput

Worked Example: Choosing a Deployment

Connections and CI/CD

Microsoft Azure AI Apps and Agents Developer Associate

1AI-103 Blueprint, Microsoft Foundry, and Solution Planning

2Generative AI, Agents, and Retrieval-Augmented Generation

3Vision, Language, Information Extraction, and Final Review

Microsoft Azure AI App and Agent Developer (AI-103)

Foundry Hubs, Projects, Models, and Deployments

Key Takeaways

Foundry Hubs, Projects, Models, and Deployments

Model Catalog to Deployment

Planning Pattern

Capacity, Quota, and Throughput

Worked Example: Choosing a Deployment

Connections and CI/CD