8.6 Hybrid Connectivity, VPN, ExpressRoute, and DNS Considerations
Key Takeaways
- Hybrid connectivity combines network reachability, routing, name resolution, security rules, and service endpoint design.
- Site-to-site VPN uses encrypted tunnels over the internet, while ExpressRoute uses private connectivity through a provider.
- Gateway subnet, local network gateway, route propagation, BGP, and address space planning are core VPN design elements.
- ExpressRoute does not automatically solve DNS, NSG, private endpoint, or application listener problems.
- Hybrid troubleshooting should validate DNS, routes, gateway health, tunnel status, firewall rules, and return paths.
Hybrid connectivity foundations
Hybrid networking connects Azure VNets with on-premises networks. The common options for AZ-104 are site-to-site VPN and ExpressRoute. Site-to-site VPN uses IPsec/IKE tunnels over the internet through a VPN gateway. ExpressRoute uses private connectivity through a connectivity provider and can offer more predictable performance and private peering paths.
Connectivity alone is not the full design. You also need non-overlapping address spaces, correct routes, DNS forwarding, NSG allowances, on-premises firewall rules, and application listeners. Many case study failures happen after the tunnel or circuit is technically connected because the name resolves incorrectly, the next hop is wrong, or a firewall blocks the port.
| Requirement | Site-to-site VPN | ExpressRoute |
|---|---|---|
| Encrypted tunnel over internet | Yes | Not the normal defining feature; private provider path. |
| Private connectivity through provider | No | Yes. |
| Lower cost and faster simple setup | Often | Usually more planning and provider coordination. |
| Predictable enterprise connectivity | Possible, but internet dependent | Stronger fit. |
| BGP support | Supported with compatible devices | Supported and commonly used. |
| DNS solved automatically | No | No. DNS still needs design. |
Site-to-site VPN components
A site-to-site VPN design includes a virtual network gateway in Azure, a gateway subnet, a public IP for the gateway, a local network gateway representing the on-premises VPN device and prefixes, and a connection object with a shared key or appropriate authentication settings. The on-premises VPN device must be configured with matching IPsec/IKE settings.
The gateway subnet is dedicated. Do not place VMs or other resources in it. Size it with future needs in mind because gateway features can require enough address capacity. The local network gateway must accurately represent on-premises address prefixes unless BGP dynamically exchanges routes.
Basic CLI outline:
az network vnet subnet create \
-g rg-network \
--vnet-name vnet-hub \
-n GatewaySubnet \
--address-prefixes 10.0.255.0/27
az network local-gateway create \
-g rg-network \
-n lgw-datacenter \
--gateway-ip-address 203.0.113.10 \
--local-address-prefixes 172.16.0.0/16
In exam questions, if on-premises routes change frequently, BGP is often a better answer than manually editing static prefixes. If address spaces overlap, routing cannot cleanly distinguish destinations. Fix address planning rather than trying to patch every subnet with UDRs.
ExpressRoute considerations
ExpressRoute uses circuits and peerings, most often private peering for access to Azure VNets through an ExpressRoute gateway. It is not a public internet VPN. It can connect multiple VNets through gateway connections and can be part of hub-and-spoke designs.
ExpressRoute does not bypass every security control. NSGs still apply to VM traffic. Azure Firewall or network virtual appliances still enforce their policies if traffic is routed through them. Private endpoints still require DNS resolution to private IP addresses. Service firewalls may still require trusted network configuration depending on the service.
If a question says an ExpressRoute circuit is connected but a VM cannot reach an on-premises server, check route tables, BGP learned routes, gateway connection state, NSGs, on-premises firewall, and return routes. If the failure is by FQDN only, check DNS forwarding before changing the circuit.
DNS in hybrid designs
Hybrid DNS is frequently the hidden problem. Azure clients may use VNet DNS settings pointing to domain controllers or DNS forwarders. On-premises clients may need conditional forwarders that send Azure private zone queries to Azure. Azure DNS Private Resolver can provide inbound endpoints for on-premises queries and outbound endpoints with forwarding rules from Azure to other DNS servers.
Private endpoints add another layer. If on-premises clients need to reach an Azure PaaS service through a private endpoint, their DNS queries for the service FQDN must resolve to the private endpoint IP. That usually requires forwarding the relevant privatelink zone to Azure or hosting equivalent records in the enterprise DNS system.
DNS troubleshooting tree:
Hybrid client cannot reach Azure service by name
|-- Does the name resolve from the affected client?
| |-- No: inspect client DNS server and conditional forwarding.
| |-- Yes: continue.
|-- Does it resolve to the intended private or public IP?
| |-- No: inspect private DNS zone records and forwarding rules.
| |-- Yes: continue.
|-- Does routing reach that IP over VPN or ExpressRoute?
| |-- No: inspect gateway, BGP, UDRs, and route propagation.
| |-- Yes: inspect NSGs, firewalls, service firewall, and app listener.
Routing, security, and return path
Hybrid routing must work in both directions. Azure may know how to reach on-premises, but on-premises must also know how to return to the Azure address space. Firewalls and network virtual appliances are stateful, so asymmetric routing can cause dropped responses even if each side has some route.
Route propagation can be disabled on route tables. That is useful in controlled hub-and-spoke designs, but it can also break expected gateway-learned routes. If a subnet suddenly cannot reach on-premises after a route table change, inspect whether gateway route propagation is disabled and whether UDRs override expected BGP routes.
Security controls exist at multiple layers:
| Layer | Azure-side example | On-premises example |
|---|---|---|
| Network route | UDR, BGP route, gateway route | Router or firewall route table |
| Network filtering | NSG, Azure Firewall | Datacenter firewall ACL |
| Name resolution | Private DNS, VNet DNS servers | AD DNS, conditional forwarder |
| Service access | Storage firewall, private endpoint | Proxy or firewall rule |
| Host access | Guest firewall, app binding | Server firewall, daemon listener |
Practical diagnostics
Use both Azure and on-premises tools. In Azure, use Network Watcher next hop, effective routes, IP flow verify, and Connection troubleshoot. On-premises, inspect VPN device logs, firewall logs, routing tables, and DNS server forwarding results.
Commands and checks:
az network vpn-connection show -g rg-network -n conn-datacenter --query connectionStatus
az network watcher show-next-hop -g rg-network --vm vm-app-1 --source-ip 10.0.2.4 --dest-ip 172.16.10.20
nslookup sql01.corp.local
Test-NetConnection sql01.corp.local -Port 1433
Exam traps
Do not say ExpressRoute automatically encrypts all traffic like a site-to-site VPN. Do not assume either VPN or ExpressRoute provides DNS. Do not use public DNS for private endpoint resolution when clients must stay on private paths. Do not forget that NSGs and service firewalls remain active after hybrid connectivity is established.
A frequent AZ-104 pattern says on-premises users can reach a VM by IP but not by name. That is a DNS issue first. Another says DNS returns the correct private IP but the connection times out. That moves the investigation to routes, NSGs, firewalls, and the application listener.
Which statement best distinguishes site-to-site VPN from ExpressRoute?
On-premises clients can reach an Azure VM by private IP but not by hostname. What area should you troubleshoot first?
A subnet lost connectivity to on-premises after a route table was associated. Which setting or object is most relevant?