5.7 VM Compute Case Lab
Key Takeaways
- A strong compute design starts with requirements for availability, security, scale, management, backup, and cost before selecting VM features.
- Bicep can express the repeatable build while operational services such as Backup, Monitor, Bastion, and autoscale complete the administrator workflow.
- Troubleshooting should isolate deployment, identity, network, guest OS, application health, monitoring, and recovery layers.
- Exam case studies reward reading constraints carefully and mapping each requirement to the minimum Azure feature that satisfies it.
Case Scenario
Contoso has a legacy line-of-business application that will move to Azure. The web tier is stateless and can run on Linux. The application tier is Windows-based and must stay on traditional VMs for now. The database runs on a separate managed database service, so application VMs should not store durable business data locally. Administrators require private management access, repeatable deployment, backup for the Windows application servers, monitoring alerts, and a way to scale the web tier during seasonal traffic.
The target region is East US. The business wants higher availability inside the region but is not yet ready for full multi-region active-active design. Security policy denies direct RDP or SSH from the internet. Cost matters, but the app supports customer transactions, so single-VM designs are not acceptable for production.
Requirements Matrix
| Requirement | Design decision | Reason |
|---|---|---|
| Repeatable deployment | Bicep modules | Avoid manual drift and support environments |
| Private admin access | Azure Bastion or VPN path, no VM public IPs | Meets no direct internet management rule |
| Stateless web scale | VM Scale Set across zones | Adds and replaces instances predictably |
| Windows app tier HA | Two or more VMs across zones or availability set | Reduces single host or zone failure risk |
| Durable data | Managed database and managed disks only for server state | Do not use temporary disk for business data |
| Backup | Recovery Services vault policy for app VMs | Supports restore and retention |
| Monitoring | Azure Monitor Agent, data collection rules, alerts | Operational visibility |
| Change safety | What-if, deployment names, staged rollout | Reduces deployment surprises |
Proposed Architecture
Use one resource group for production compute resources and one for shared operations resources if the organization separates duties. Create a hub or shared management network if that already exists; otherwise, deploy a VNet with subnets for web, app, Bastion, and private endpoints where needed. Deploy the web tier as a VM Scale Set with instances distributed across zones and behind a load balancer or application gateway. Deploy the Windows application tier as multiple VMs, also zone-aware when supported, with no public IP addresses.
Use managed identities for VMs that need Azure resource access. Use Key Vault for secrets, but avoid placing secrets in template files. Protect the Windows application VMs with Azure Backup. Enable boot diagnostics and Azure Monitor Agent. Configure NSGs so only required app ports flow between tiers. Management traffic should arrive through Bastion or a private network path.
Bicep Skeleton
The following skeleton is not a full production template, but it shows how an administrator structures the deployment into parameters, modules, and outputs:
targetScope = 'resourceGroup'
param location string = resourceGroup().location
param environment string = 'prod'
param adminUsername string
@secure()
param adminPassword string
module network './modules/network.bicep' = {
name: 'network-${environment}'
params: {
location: location
environment: environment
}
}
module web './modules/web-vmss.bicep' = {
name: 'web-vmss-${environment}'
params: {
location: location
subnetId: network.outputs.webSubnetId
instanceCount: 3
vmSku: 'Standard_D2s_v5'
}
}
module app './modules/windows-app-vms.bicep' = {
name: 'app-vms-${environment}'
params: {
location: location
subnetId: network.outputs.appSubnetId
adminUsername: adminUsername
adminPassword: adminPassword
vmNames: [
'vm-app-01'
'vm-app-02'
]
}
}
output webFrontendIp string = web.outputs.frontendIp
The real modules would define NSGs, load balancing, health probes, VM extensions, diagnostic settings, and backup registration where appropriate. The point is that the environment-specific choices are parameters, while the architecture is repeatable.
Build and Validate Workflow
Start with validation and what-if:
az account set --subscription SUB-PROD
az group create --name rg-contoso-prod-compute --location eastus
az deployment group what-if \
--resource-group rg-contoso-prod-compute \
--template-file main.bicep \
--parameters @prod.parameters.json
az deployment group create \
--name contoso-prod-20260505 \
--resource-group rg-contoso-prod-compute \
--template-file main.bicep \
--parameters @prod.parameters.json
After deployment, verify instance and network state:
az vmss list-instances -g rg-contoso-prod-compute -n vmss-web-prod --output table
az vm list -g rg-contoso-prod-compute -d --output table
az network nic list-effective-nsg -g rg-contoso-prod-compute -n vm-app-01-nic
az monitor metrics list --resource RESOURCE_ID --metric "Percentage CPU"
Then enable or confirm backup:
az backup protection enable-for-vm \
--resource-group rg-contoso-prod-ops \
--vault-name rsv-contoso-prod \
--vm /subscriptions/SUB-PROD/resourceGroups/rg-contoso-prod-compute/providers/Microsoft.Compute/virtualMachines/vm-app-01 \
--policy-name AppServerDaily
Failure Injection Walkthrough
Failure 1: The web scale set deploys, but the site returns 502 through the application gateway. Check backend health. If probes fail, verify the probe path, port, NSG rules, guest firewall, and whether the web service started after cloud-init or extension execution. If instances are unhealthy because an extension failed, review extension status and logs before changing the load balancer.
Failure 2: The Windows app VM cannot be reached by RDP. Because policy denies public RDP, this is expected unless Bastion or VPN is configured. Check Bastion subnet name, Bastion public IP, VNet peering or route path, NSG rules, and guest firewall. Do not add a public IP and open 3389 unless the requirement changes.
Failure 3: Backup enablement fails for vm-app-02. Check vault region, RBAC, whether the VM is already protected, locks, policy, VM agent health, and backup extension status. Compare with vm-app-01, which is protected, to identify configuration drift.
Failure 4: Seasonal traffic arrives, but autoscale does not add web instances. Check autoscale min and max, metric rule duration, cooldown, CPU metric availability, and quota for the VM family in East US. If max count is 3 and current count is 3, the rule is working as configured but the design limit is too low.
Design Tradeoffs
| Decision | Good answer | Weak answer |
|---|---|---|
| Management access | Bastion or private connectivity | Public RDP or SSH to every VM |
| Web scale | Scale set with health probes | Manual clone of one VM during outage |
| App availability | Multiple VMs across zones or set | One large VM only |
| Deployment | Bicep with parameters and what-if | Portal-only undocumented build |
| Recovery | Backup policy and restore test | Assume snapshots are enough |
| State | External database and durable disks | Store customer data on temporary disk |
Exam Case Study Method
Read the constraints first: security, region, availability, cost, and management model. Then map each requirement to the smallest feature that satisfies it. If a public IP is forbidden, eliminate answers that open inbound internet management. If stateless scale is required, prefer scale sets and autoscale. If the requirement is cross-region disaster recovery, availability zones are not enough. If the requirement is a repeatable deployment, prefer Bicep or ARM over portal steps.
Finally, validate operational completeness. A VM that deploys successfully but lacks backup, monitoring, secure access, and patch management is not finished. Azure administrators are tested on running the environment, not merely creating it.
In the Contoso case, why is a VM Scale Set a strong fit for the web tier?
Security policy denies direct RDP and SSH from the internet. Which design choice aligns with the requirement?
A what-if operation shows that a deployment will replace a production NIC. What should the administrator do next?