8.4 Load Balancing Troubleshooting Workflow

Key Takeaways

Troubleshooting should follow the packet path from client DNS to frontend IP, rule, probe, backend listener, NSG, route, and application response.
A healthy VM is not the same as a healthy load balancer backend; probe status controls new flow eligibility.
Standard Load Balancer failures frequently involve NSG rules, probe mismatch, backend pool membership, or asymmetric routing.
Testing from the right source network is critical because public, private, peered, and hybrid clients can see different results.
Use Network Watcher and guest OS tests together; neither view alone proves the whole path.

Last updated: May 2026

Troubleshoot by path, not by guessing

When a load-balanced service fails, the symptoms can be misleading. Users say the website is down, but the cause might be DNS, a public IP change, an NSG deny, a wrong probe path, a backend pool that is empty, a user-defined route to a firewall, or an application process that stopped listening. The administrator's job is to reduce the problem to a specific hop.

Use this workflow for most Azure Load Balancer cases:

Client reports failure
|-- Step 1: Does the client resolve the expected name to the load balancer frontend IP?
|-- Step 2: Is the client connecting to the expected protocol and port?
|-- Step 3: Does the load balancer rule exist for that frontend, protocol, and port?
|-- Step 4: Does the rule reference the correct backend pool and health probe?
|-- Step 5: Are backends in the pool and marked healthy?
|-- Step 6: Do NSGs allow client traffic and probe traffic?
|-- Step 7: Do routes deliver traffic to the backend and return traffic correctly?
|-- Step 8: Is the application listening and returning a valid response?

This order matters because it prevents unnecessary rebuilds. If DNS points to an old public IP, changing the health probe is wasted effort. If the probe is unhealthy because the app is down, changing the load balancer SKU does not solve the problem.

Step 1: DNS and frontend IP

Confirm that the user is connecting to the intended endpoint. Public clients should resolve to the public frontend IP. Internal clients should resolve to the internal frontend IP. Hybrid clients may use conditional forwarding and private DNS, so test from a client on the affected network.

Commands:

nslookup api.contoso.com
Test-NetConnection api.contoso.com -Port 443
curl -v https://api.contoso.com/health
az network public-ip show -g rg-network -n pip-web --query ipAddress -o tsv

If the name resolves incorrectly, fix DNS before changing the load balancer. For internal load balancers, check the private DNS zone record and VNet links. For public load balancers, check the public A or CNAME record, TTL, and whether the public IP changed during redeployment.

Step 2: Rule and backend pool

The rule must match the user traffic. A TCP rule on frontend port 443 does not handle UDP. A rule on frontend port 80 does not handle HTTPS unless clients really connect to 80. The backend pool must contain the intended instances.

Portal path: Load balancer > Load balancing rules, then inspect the frontend IP, protocol, frontend port, backend port, backend pool, and health probe. Then go to Backend pools and confirm membership.

CLI checks:

az network lb rule list -g rg-network --lb-name lb-web-public -o table
az network lb address-pool show -g rg-network --lb-name lb-web-public -n be-web

Step 3: Probe health

Probe state explains many one-backend and no-backend failures. If all probes fail, the load balancer has no healthy destinations for new flows. If one probe fails, traffic distribution may look uneven because Azure correctly avoids that instance.

Test the probe endpoint from the backend itself first. If curl http://localhost:8080/health fails, fix the app before changing Azure. Then test from another VM in the VNet. If local succeeds but network test fails, look at the OS firewall, NSG, route table, and application binding address. An app bound only to 127.0.0.1 may pass local tests but fail from the network.

The NSG must also allow the probe. In many designs, an inbound allow from the AzureLoadBalancer service tag to the probe port is required. For client traffic, allow the client source or internet source to the application port as appropriate. Do not open broad ranges if a narrow source and port satisfy the requirement.

Step 4: Routes and asymmetric paths

User-defined routes can break load balancing. A subnet route might send traffic through a network virtual appliance, firewall, or gateway. That can be valid, but return traffic must be symmetric enough for stateful devices, and the appliance must allow the flow. If the client is on-premises, check ExpressRoute or VPN routes and whether the on-premises firewall knows the return path.

Next hop inspection is useful when the packet seems to vanish:

az network watcher show-next-hop \
  -g rg-network \
  --vm vm-web-1 \
  --source-ip 10.20.2.4 \
  --dest-ip 10.10.1.20

For inbound public load balancing, remember that the backend sees traffic according to Azure's load balancing behavior and the application response must route back correctly. UDRs that force default traffic to a firewall can alter outbound paths. If a firewall is in the path, inspect firewall logs and rules.

Step 5: Guest OS and application

Azure can route only to a listener that exists. On Linux, check ss -lntp, systemctl status, and app logs. On Windows, check netstat -ano, Windows Firewall, IIS bindings, service status, and event logs. If the process listens on IPv6 only, loopback only, or a different port, the load balancer will not fix it.

Guest OS firewalls are easy to miss because Azure NSGs may look correct. Windows Defender Firewall or Linux firewall rules can block the backend port or probe port. In exam scenarios, if Azure configuration is correct but only a specific VM fails, the guest firewall or application listener becomes likely.

Fast comparison of symptoms

Symptom	Most likely area	First diagnostic
FQDN resolves to old address	DNS	`nslookup` from affected client.
Frontend IP reachable from one network but not another	NSG, route, firewall, DNS split horizon	Test from both sources.
No backends receive traffic	Probe, rule, backend pool, NSG	Check probe health and pool membership.
Only one backend receives traffic	Other backends unhealthy or session persistence	Inspect probe result per instance.
Backend works by direct IP but not through load balancer	Rule, probe, frontend, DNS	Compare direct listener test with frontend test.
Probe healthy but app response broken	Application path or dependency	Review app logs and dependency connectivity.

Exam approach

AZ-104 often describes a failure after a single change. Anchor on that change. If public network access was disabled on a service, DNS for a private endpoint becomes likely. If an NSG was tightened, check the application and probe ports. If a VM was replaced, check backend pool membership. If a new /healthz endpoint was deployed but the probe still checks /, update the probe.

Do not jump to deleting and recreating a load balancer. Most failures are configuration mismatches. A disciplined path test is faster and safer.

Test Your Knowledge

Users cannot reach api.contoso.com through a public load balancer. What is the best first check in a path-based workflow?

Confirm the DNS name resolves to the expected load balancer frontend IP

Create a Recovery Services vault

Increase the VM OS disk size

Enable blob versioning

Test Your Knowledge

All backend VMs are running, but the load balancer sends no new flows to them. Which condition most directly explains this?

The health probe marks all backends unhealthy

The resource group has no tags

The DNS zone has a TXT record

The subscription has a budget

Test Your Knowledge

A backend app passes curl localhost but fails from another VM. What should you investigate next?

Whether the app is bound only to loopback, or blocked by OS firewall, NSG, or route

Whether the public DNS zone has an SOA record

Whether Azure Backup soft delete is enabled

Whether the VM uses managed disks

Up Next

8.5 Network Watcher: IP Flow, Next Hop, and Connection Troubleshoot

Continue learning

AZ-104 Microsoft Azure Administrator Study Guide

1Chapter 1: Exam Orientation and Microsoft Learn Source Control

2Chapter 2: Identity, RBAC, and Governance Foundations

3Chapter 3: Subscriptions, Policy, Costs, and Resource Organization

4Chapter 4: Storage Accounts, Access, and Data Protection

5Chapter 5: Compute, Virtual Machines, Scale Sets, and Bicep

6Chapter 6: Containers, App Service, and Platform Compute

7Chapter 7: Virtual Networking, Routing, and Secure Access

8Chapter 8: Name Resolution, Load Balancing, and Connectivity Troubleshooting

9Chapter 9: Monitoring, Alerting, Backup, and Site Recovery

10Chapter 10: AZ-104 Integration and Troubleshooting Case Labs

11Chapter 11: Final Review, Exam Experience, Renewal, and Career Path

8.4 Load Balancing Troubleshooting Workflow

Key Takeaways

Troubleshoot by path, not by guessing

Step 1: DNS and frontend IP

Step 2: Rule and backend pool

Step 3: Probe health

Step 4: Routes and asymmetric paths

Step 5: Guest OS and application

Fast comparison of symptoms

Exam approach