Azure Virtual Machine Recovery Strategy for Regional Failures | Best Practices

Recovery Strategy for Azure Virtual Machines in Case of Regional Failure

Question

You have an Azure subscription for used for testing and development purposes only. The subscription contains Azure virtual machines that unmanaged, standard hard disk drives (HDD).

You need to recommend a recovery strategy for the virtual machines if an Azure region fails for a sustained period. The recovery time objective (RTO) can be up to seven days. The solution must minimize costs.

What should you include in the recommendation?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

B

Geo-redundant storage (GRS) is designed to provide at least 99.99999999999999% (16 9's) durability of objects over a given year by replicating your data to a secondary region that is hundreds of miles away from the primary region. If your storage account has GRS enabled, then your data is durable even in the case of a complete regional outage or a disaster in which the primary region isn't recoverable.

GRS replicates your data to another data center in a secondary region, but that data is available to be read only if Microsoft initiates a failover from the primary to secondary region.

Incorrect Answers:

A, C: If a datacenter-level disaster (for example, fire or flooding) occurs, all replicas in a storage account using LRS may be lost or unrecoverable. To mitigate this risk, Microsoft recommends using zone-redundant storage (ZRS), geo-redundant storage (GRS), or geo-zone-redundant storage (GZRS).

https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs

The answer to this question would be A. Store the disks in a Standard_LRS storage account. Configure Azure Site Recovery. If a failure occurs, initiate a manual failover.

Here is the detailed explanation:

The scenario presented in this question is about disaster recovery (DR) for an Azure subscription that is used for testing and development purposes only. This means that the recovery time objective (RTO) can be up to seven days, which is a relatively long time compared to typical production environments. Additionally, the solution must minimize costs.

The first step in designing a DR strategy is to identify the critical workloads and their dependencies. In this case, the critical workloads are the virtual machines (VMs) that are used for testing and development. The VMs are running on unmanaged, standard hard disk drives (HDD), which means that they are not using managed disks.

The next step is to choose the appropriate storage account type. There are two types of storage accounts in Azure: Standard and Premium. Standard storage accounts offer two redundancy options: Locally Redundant Storage (LRS) and Geo-Redundant Storage (GRS). LRS provides three copies of data within a single data center, while GRS provides additional copies in a paired data center. Premium storage accounts offer two redundancy options: Zone Redundant Storage (ZRS) and Geo-Zone Redundant Storage (GZRS).

For this scenario, the recommended storage account type is Standard_LRS. This is because the RTO is up to seven days, and LRS provides sufficient redundancy within a single data center. Also, using LRS instead of GRS can help minimize costs.

The next step is to configure Azure Site Recovery (ASR). ASR is a DR solution that replicates VMs and physical servers to a secondary location. It provides automated failover and failback, as well as customizable recovery plans. For this scenario, the recommendation is to configure ASR and initiate a manual failover in case of a sustained region failure.

The last option, which is to manually create the VMs using Azure Resource Manager (ARM) templates, is not recommended. This is because it is a manual process that can be time-consuming and error-prone. Also, it does not provide automated failover, which can result in longer RTOs.

In summary, the recommended solution for this scenario is to store the disks in a Standard_LRS storage account, configure Azure Site Recovery, and initiate a manual failover if a failure occurs. This solution provides sufficient redundancy, minimizes costs, and provides automated failover for faster recovery times.