5.6 Backup, Restore, and Operational Maintenance for VMs

Key Takeaways

  • Azure Backup protects VMs through vaults, policies, recovery points, and restore workflows; snapshots alone are not a full backup strategy.
  • Operational maintenance includes patching, monitoring, update assessment, boot diagnostics, serial console, run command, redeploy, and change control.
  • Restore choices include creating a new VM, restoring disks, replacing disks in some scenarios, and file-level recovery.
  • Backup failures commonly involve agent health, policy scope, vault configuration, locks, soft delete, network restrictions, and unsupported VM states.
Last updated: May 2026

Backup Architecture

Azure Backup for VMs is usually configured through a Recovery Services vault. The vault stores backup metadata and recovery points according to a backup policy. A policy defines schedule, frequency, retention, and backup behavior. The protected item is the VM. Recovery points can be crash-consistent, file-system consistent, or application-consistent depending on OS, agents, and workload support.

A snapshot is not the same as Azure Backup. Snapshots are useful for short-term disk capture and pre-change rollback, but a vault policy gives centralized retention, restore workflows, soft delete, monitoring, alerts, and operational reporting. In exam scenarios, choose snapshots for quick disk point-in-time copy and Azure Backup for managed recovery and retention.

ComponentRoleAdministrator task
Recovery Services vaultBackup management containerCreate in proper region and configure security
Backup policySchedule and retentionMatch RPO and retention requirements
Protected itemVM under backupEnable, monitor, and troubleshoot protection
Recovery pointRestore sourceSelect correct date and consistency type
Soft deleteProtection against accidental deletionUnderstand retention and undelete behavior

Portal path: Recovery Services vaults > vault name > Backup > Azure Virtual Machine. You can also enable backup during VM creation on the Management tab. For production, confirm the vault region, redundancy, soft delete, immutability if required, and access controls.

CLI example:

az backup vault create \
  --resource-group rg-prod-ops \
  --name rsv-prod-eastus \
  --location eastus
az backup protection enable-for-vm \
  --resource-group rg-prod-ops \
  --vault-name rsv-prod-eastus \
  --vm /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/rg-prod-compute/providers/Microsoft.Compute/virtualMachines/vm-app-01 \
  --policy-name DefaultPolicy
az backup job list -g rg-prod-ops --vault-name rsv-prod-eastus --output table

Restore Options

Restore method depends on the failure. If a user deleted files, file-level recovery may be enough. If the OS disk is corrupted, restoring disks or replacing the OS disk may be appropriate. If the VM is lost or you want isolated validation, create a new VM from a recovery point. If the region is unavailable, you need a cross-region or disaster recovery design that was planned before the outage.

ProblemRestore choiceNotes
Accidental file deletionFile recoveryMount recovery point and copy files
Bad application updateRestore disks or new VMValidate app consistency
OS boot failureRestore OS disk or create new VMKeep original for investigation if needed
VM deletedRestore VM or disksDepends on backup state and retained points
Regional disasterASR failover or cross-region restore where configuredMust be planned and tested

Do not restore over production without a rollback plan. A restored VM can conflict with names, IPs, domain membership, or application identity. Restoring disks gives more control but requires you to attach them correctly and repair boot or data configuration.

Maintenance Operations

Operational maintenance includes planned patching, guest configuration, monitoring, performance review, security posture, backup verification, and cost controls. Azure Update Manager can assess and orchestrate updates for Azure VMs and hybrid servers through Azure Arc. Maintenance configurations can help control platform maintenance timing for supported resources. Auto-shutdown can reduce cost for dev/test but should not be used for production workloads unless intentional.

Common commands:

az vm get-instance-view -g rg-prod-compute -n vm-app-01 --query instanceView.statuses
az vm boot-diagnostics get-boot-log -g rg-prod-compute -n vm-app-01
az vm repair create -g rg-prod-compute -n vm-app-01 --repair-username repairadmin --repair-password 'Use-A-Secure-Method'
az vm redeploy -g rg-prod-compute -n vm-app-01
az vm reapply -g rg-prod-compute -n vm-app-01

Redeploy moves the VM to another host and can resolve host issues. Reapply reapplies the VM model and can help when the platform state is inconsistent. VM repair commands can create a repair VM and attach a copy of the OS disk for offline repair in supported workflows. Run Command can execute scripts when normal access is unavailable but the VM agent is working.

Monitoring and Diagnostics

For VM health, combine platform metrics, guest metrics, logs, boot diagnostics, and alerts. Azure Monitor can collect VM metrics and logs through the Azure Monitor Agent and data collection rules. VM insights provides performance and dependency views. Boot diagnostics captures screenshots and serial logs that are useful when the VM fails before network services start.

SignalUse it for
Activity logControl plane operations such as start, resize, redeploy
MetricsCPU, disk, network, availability signals
Guest logsOS and application troubleshooting
Boot diagnosticsBoot failure, kernel panic, blue screen, login prompt evidence
Serial consoleEmergency command-line access for supported VMs
Backup jobsProtection success, duration, and failures

Backup Troubleshooting

Scenario 1: Backup fails with extension or snapshot errors. Check VM agent status, extension health, guest OS state, disk status, and whether another backup or snapshot operation is already running. For Windows application consistency, VSS issues inside the guest can matter.

Scenario 2: Backup cannot be enabled. Check whether the VM and vault are in supported regions, RBAC permissions, locks, policy restrictions, and whether the VM is already protected by another vault in a conflicting way. Confirm the selected backup policy is valid.

Scenario 3: Restore completes but application fails. Check restored IP addressing, DNS, certificates, managed identity permissions, secrets, domain trust, firewall rules, and database connection strings. Infrastructure restore does not guarantee application consistency if the app needs coordinated recovery steps.

Scenario 4: A user deletes a protected VM and asks for immediate deletion of backup data. Soft delete and security settings may intentionally retain backup data for a period to protect against accidental or malicious deletion. The exam may ask which feature prevents immediate permanent loss of backup items.

Operational Runbook Pattern

For a production VM, define a runbook that includes owner, business service, RPO, RTO, backup policy, patch window, monitoring alerts, escalation path, access method, recovery test schedule, and rebuild method. Infrastructure as code should define the VM and baseline settings. Backup and monitoring should be verified after deployment, not assumed.

Before risky maintenance, confirm the latest successful backup, create a short-lived snapshot if appropriate, review change windows, drain load-balanced traffic, apply updates to one instance first, validate health probes and application tests, then continue. For scale sets, prefer rolling upgrades and health checks. For single VMs, plan downtime and rollback.

AZ-104 does not require deep application administration, but it does require administrator judgment. Know when to use backup, when to use snapshots, when to redeploy, when to use serial console, and when the problem is actually DNS, NSG, guest firewall, or application configuration.

Test Your Knowledge

Which restore option is usually best when a user accidentally deleted a few files from a protected VM?

A
B
C
D
Test Your Knowledge

A VM is unreachable over the network after a boot change. Which feature can help diagnose early boot problems?

A
B
C
D
Test Your Knowledge

Why are managed disk snapshots not a complete replacement for Azure Backup?

A
B
C
D