Foundry Hubs, Projects, Models, and Deployments
Key Takeaways
- Microsoft Foundry is the Azure platform layer for building, optimizing, deploying, tracing, evaluating, and governing AI apps and agents.
- A hub or Foundry resource provides shared governance and infrastructure, while a project organizes workload-specific assets such as deployments, indexes, datasets, evaluations, files, and connections.
- The model catalog is for discovery and selection; a deployment is the named callable endpoint that an app or agent actually invokes.
- Standard Foundry resource deployments are the preferred path when supported, while serverless API and managed compute deployments serve different model, isolation, and operational needs.
- AI-103 scenarios often test tradeoffs: model capability versus latency and cost, project isolation versus shared governance, and standard capacity versus provisioned throughput or managed compute.
Foundry Hubs, Projects, Models, and Deployments
Microsoft Foundry is the platform environment for AI apps and agents on Azure. Microsoft describes it as a unified platform-as-a-service for enterprise AI operations, model builders, and application development. For AI-103, that means Foundry is not just a playground: it is where teams organize projects, choose models, configure deployments, connect data and tools, run evaluations, trace behavior, and govern access.
The terminology can look confusing because Microsoft documentation includes both newer Foundry projects and classic hub-based projects. The exam-safe mental model is simple: shared governance and infrastructure sit above workload workspaces, and projects are where teams build the actual app or agent.
| Concept | What it owns | Exam clue |
|---|---|---|
| Foundry resource | Administrative, security, monitoring, networking, and policy boundary | The scenario asks for centralized governance or RBAC across projects |
| Hub or hub-based project | Shared settings such as data connections, compute, network configuration, and classic advanced features | The prompt mentions hubs, prompt flow, managed compute, or Azure Machine Learning compatibility |
| Project | Workload workspace for app or agent assets | The team needs deployments, indexes, datasets, evaluations, files, and connections scoped to one solution |
| Connection | Reusable reference to another service and its authentication method | The app should use Azure OpenAI, AI Search, Storage, APIs, or tools without embedding credentials |
Model Catalog to Deployment
The model catalog is the discovery and selection surface. Use it to compare task fit: large language models for broad generation, small language models for low-cost narrow tasks, reasoning models for complex multi-step work, embedding models for retrieval, code models for developer scenarios, and multimodal models when text alone is not enough.
A deployment is different from a model. The model is the artifact or family; the deployment is the named endpoint configuration the application calls. A deployment can have its own model version, throughput choice, content filter, region, quota, and operational settings. Multiple deployments can expose the same model for different workloads, such as gpt-4o-mini-chat-dev, gpt-4o-prod, and embedding-rag-indexer.
| Deployment choice | Best fit | Tradeoff to notice |
|---|---|---|
| Standard deployment in Foundry resources | Common Azure OpenAI and Foundry Models workloads | Fastest managed path, but subject to shared regional capacity and quota |
| Provisioned throughput | High-volume production workloads that need predictable throughput and latency | Higher planning commitment; capacity must be sized and monitored |
| Serverless API deployment | Certain catalog models where managed endpoint access is enough | Availability and billing depend on model/provider support |
| Managed compute deployment | Open-source, custom, or isolated model hosting with dedicated managed VMs | More infrastructure choices and cost responsibility |
Planning Pattern
Start with the business requirement, then work inward:
- Choose the project boundary: team, workload, environment, and data sensitivity.
- Choose the model family by task, modality, context size, latency, cost, and safety profile.
- Choose the deployment type by capacity, isolation, model availability, and network requirements.
- Create project connections for retrieval indexes, storage, APIs, and monitoring.
- Test with evaluations before routing user traffic.
A strong AI-103 answer usually avoids two extremes. It does not put every experiment into one shared project with broad keys, and it does not overbuild managed GPU infrastructure when a standard Foundry resource deployment satisfies the requirement. The right design is the smallest governed workspace and deployment plan that can meet the app's behavior, cost, security, and operations targets.
A team selected a model from the Foundry model catalog, but their app still cannot call it. What is the missing step?