9.6 Azure Site Recovery, Failover, and Replication

Key Takeaways

Azure Site Recovery replicates workloads so they can fail over to a recovery location during an outage or planned migration.
ASR is different from backup because it maintains replicated state for failover rather than storing retained recovery points for restore.
Replication design includes source and target region, cache storage, target networks, VM sizes, disk choices, and recovery point retention.
Test failover validates the disaster recovery plan without committing production failover.
Failback and reprotection must be planned so workloads are protected again after recovery.

Last updated: May 2026

Backup versus Site Recovery

Azure Backup and Azure Site Recovery solve different recovery problems. Backup protects data and systems by creating recovery points that can be restored. Site Recovery, or ASR, replicates machines to a recovery location and orchestrates failover. If the requirement says recover a deleted file from last Tuesday, think backup. If it says keep an application available during a regional outage, think ASR, load balancing, DNS, and application architecture.

ASR can protect Azure VMs between regions, VMware or physical servers to Azure, and other supported scenarios. For AZ-104, the most common pattern is Azure VM replication to a secondary Azure region using a Recovery Services vault. You configure source and target settings, enable replication, monitor health, run test failover, and perform failover when required.

Requirement	Best fit	Reason
Restore a VM from last week's recovery point	Azure Backup	Retained point-in-time restore.
Fail over a VM to another Azure region	Azure Site Recovery	Replication and failover orchestration.
Validate DR without disrupting production	ASR test failover	Runs isolated validation.
Keep logs for audit	Diagnostic setting to storage or workspace	Monitoring data retention, not workload DR.
Recover from accidental file deletion	Backup file recovery	ASR is not a file restore tool.

Replication design

A replication setup starts with a Recovery Services vault. The vault holds the ASR configuration and recovery metadata. For Azure-to-Azure replication, choose the source region, target region, target resource group, target virtual network, target subnet, cache storage account, replica managed disk type, and replication policy. The cache storage account is used during replication; the replicated disks are maintained for recovery.

Network design is critical. The failed-over VM needs a target virtual network and subnet that allow the application to work. IP addressing, DNS, NSGs, UDRs, private endpoints, firewalls, and load balancers may all need corresponding recovery-region design. ASR can replicate a VM, but it does not automatically redesign an application dependency chain. If the app depends on a database, identity endpoint, storage private endpoint, or on-premises route, the recovery plan must account for it.

Sizing also matters. Target VM size should be available in the recovery region and large enough for the workload. Disk SKU choices should match performance needs. Quotas must exist in the target region. If test failover fails because the target region lacks quota for the selected VM family, the replication configuration may be valid but the recovery cannot instantiate the VM.

ASR workflow

Portal path: Azure portal > Recovery Services vault > Site Recovery > Enable replication. For Azure VMs, select source settings, VMs, target settings, replication policy, and review.

A practical workflow:

Create or select a Recovery Services vault.
Confirm target region, target resource group, VNet, subnet, disk type, and cache storage account.
Enable replication for protected VMs.
Wait for initial replication to complete and confirm health.
Create recovery plans for multi-VM applications.
Run test failover into an isolated network.
Document DNS, load balancer, and application validation steps.
Perform planned or unplanned failover only when the business decision is made.
Commit failover after validation.
Reprotect so the workload is protected in the new direction.

Recovery plans are important for multi-tier applications. They group protected machines and can define startup order. For example, start domain services first, then databases, then application servers, then web servers. Recovery plans can also include scripts or manual actions. AZ-104 may ask how to coordinate failover for multiple VMs in order; recovery plans are the answer.

Test, planned, and unplanned failover

Test failover validates recovery without affecting production replication. Choose a test network that is isolated from production to avoid duplicate IPs, duplicate domain members, or unintended client traffic. After validation, clean up the test failover. This is the safest routine DR exercise.

Planned failover is used when the source is still available, such as during a planned datacenter migration or expected outage. It attempts to synchronize final changes before failover to minimize data loss. Unplanned failover is used when the source is unavailable. It may use the latest available recovery point and can involve data loss depending on replication state.

Failover type	Use case	Data loss expectation	Key action
Test failover	DR validation	No production impact expected	Use isolated test network and clean up.
Planned failover	Controlled move while source is available	Minimized by final sync	Shut down or coordinate source and commit.
Unplanned failover	Source outage	Depends on latest recovery point	Choose recovery point and validate app.
Failback	Return to original or preferred region	Requires reprotection planning	Reprotect and fail over in reverse direction.

Monitoring replication

ASR provides replication health, RPO, job status, and error details in the vault. Azure Monitor alerts can notify on replication health issues or failover jobs. Backup Center and vault views help centralize operational awareness. A healthy replication status is not the same as a tested application recovery plan. The administrator should test failover regularly and verify application behavior.

KQL-style operational review can use diagnostics if vault logs are sent to Log Analytics. Table schemas depend on configuration, but the pattern is to filter the vault diagnostics for Site Recovery jobs and health events:

AzureDiagnostics
| where TimeGenerated > ago(24h)
| where Category has "AzureSiteRecovery"
| where JobStatus_s in ("Failed", "Warning") or ReplicationHealth_s != "Normal"
| project TimeGenerated, Resource, Category, JobStatus_s, ReplicationHealth_s, ErrorMessage_s
| order by TimeGenerated desc

Troubleshooting ASR

Initial replication failures often involve unsupported VM configuration, disk limits, extension problems, cache storage account access, or target region quota. Ongoing replication health issues can involve high churn, network connectivity, storage throttling, or agent issues depending on the protected workload type. Failover failures often involve target region capacity, target network configuration, boot diagnostics, encryption dependencies, or missing permissions.

For encrypted VMs, confirm Key Vault access in the target region and required permissions for recovery. For private workloads, confirm private DNS zones and endpoints exist or have a recovery design. For domain-joined VMs, confirm domain controller availability in the recovery environment. For load-balanced applications, confirm backend pools and health probes in the recovery region.

Exam traps

Do not choose ASR to recover a single deleted file. Do not choose Azure Backup alone when the requirement is regional failover with low recovery time. Do not run test failover into the production network unless the scenario explicitly accounts for isolation and conflicts. Do not forget to commit failover and reprotect after a real failover. Do not assume ASR protects every dependency; DNS, identity, networking, database replication, and application configuration must be part of the disaster recovery plan.

Test Your Knowledge

Which ASR operation validates disaster recovery without disrupting the production VM?

Test failover

File recovery

Metric dimension split

Storage lifecycle transition

Test Your Knowledge

A multi-tier application must fail over with database servers started before web servers. What ASR feature should be used?

Recovery plan

Action group email receiver

Blob soft delete

Metric namespace

Test Your Knowledge

Which statement best distinguishes Azure Backup from Azure Site Recovery?

Backup restores from retained recovery points, while Site Recovery replicates workloads for failover.

Backup only sends email, while Site Recovery only stores metrics.

Backup requires public IP addresses, while Site Recovery requires no network planning.

Backup is used only for NSG rules, while Site Recovery is used only for KQL.

Up Next

9.7 Monitoring and Recovery Case Lab

Continue learning

AZ-104 Microsoft Azure Administrator Study Guide

Azure AZ-104

9.6 Azure Site Recovery, Failover, and Replication

Key Takeaways

Backup versus Site Recovery

Replication design

ASR workflow

Test, planned, and unplanned failover

Monitoring replication

Troubleshooting ASR

Exam traps

AZ-104 Microsoft Azure Administrator Study Guide

1Chapter 1: Exam Orientation and Microsoft Learn Source Control

2Chapter 2: Identity, RBAC, and Governance Foundations

3Chapter 3: Subscriptions, Policy, Costs, and Resource Organization

4Chapter 4: Storage Accounts, Access, and Data Protection

5Chapter 5: Compute, Virtual Machines, Scale Sets, and Bicep

6Chapter 6: Containers, App Service, and Platform Compute

7Chapter 7: Virtual Networking, Routing, and Secure Access

8Chapter 8: Name Resolution, Load Balancing, and Connectivity Troubleshooting

9Chapter 9: Monitoring, Alerting, Backup, and Site Recovery

10Chapter 10: AZ-104 Integration and Troubleshooting Case Labs

11Chapter 11: Final Review, Exam Experience, Renewal, and Career Path

Azure AZ-104

9.6 Azure Site Recovery, Failover, and Replication

Key Takeaways

Backup versus Site Recovery

Replication design

ASR workflow

Test, planned, and unplanned failover

Monitoring replication

Troubleshooting ASR

Exam traps