Disaster Recovery Strategies for Critical Web Servers in Multiple AWS Regions

Best Disaster Recovery Strategy for Critical Web Servers

Question

A global investment firm has recently deployed web servers in multiple AWS regions.

Applications on these servers are critical to the firm.

Only a few minutes of downtime is acceptable in the event of a disaster.

Management is looking for a strategy that would provide faster RTO & low RPO with a constrained budget. Which of the most appropriate disaster recovery strategies can be adopted?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: C.

RTO (Recovery Time Objective) is a period for which downtime is observed post-disaster.

RPO is a recovery point objective that defines the amount of data loss when there is a disaster.

With the Warm-Standby disaster recovery strategy, all resources deployed at primary sites are deployed at secondary sites at a minimum size.

In case of disaster at the primary site, resources at the secondary site are scaled up to achieve load as that of the primary site.

This strategy provides a faster RTO & low RPO.

When the primary site is up & running, resources at the secondary site are not running at full load which lowers cost.

It costs lower than that of Multi-site strategy in which both primary & secondary sites are running with resources at full load.

Option A is incorrect as Multi-site will provide zero downtime during a disaster when full production scale resources are running at the secondary sites.

But this will incur high costs.

Since the firm is looking for disaster recovery with a limited budget, this is not the correct option.

Option B is incorrect as the Pilot light strategy will provide RTO in tens of minutes.

RPO will be higher as resources will be initiated only once a disaster occurs at the primary site.

Option D is incorrect as the Backup & restore strategy will provide RTO in hours & will incur higher RPO.

For more information on disaster recovery strategies, refer to the following URL,

https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/disaster-recovery-workloads-on-aws.pdf

The global investment firm has a requirement for a disaster recovery strategy that provides faster recovery time objective (RTO) and low recovery point objective (RPO) while also being budget-friendly.

Let's first understand what RTO and RPO mean. Recovery time objective (RTO) is the maximum allowable downtime that an application can experience before it needs to be back up and running. Recovery point objective (RPO) is the maximum allowable data loss that can occur before the disaster recovery solution must restore the data to a point in time.

Now, let's look at the four options given in the question and see which one would be most appropriate:

A. Multi-Site: This strategy involves deploying identical resources in two or more geographic locations. The resources are active in one location and standby in the other. In the event of a disaster, traffic is redirected to the standby location. Multi-Site is an excellent strategy for disaster recovery as it provides a low RPO and a fast RTO. However, it can be expensive as it requires resources to be active in multiple locations simultaneously.

B. Pilot Light: The Pilot Light strategy involves maintaining a minimal version of the application in a separate environment. The minimal version includes essential components such as a database and core application servers. In the event of a disaster, the environment can be quickly scaled up to meet the demand. The Pilot Light strategy can provide a low RPO and a fast RTO, but it requires more time to scale up the environment than a Multi-Site strategy.

C. Warm Standby: The Warm Standby strategy involves maintaining a partially operational version of the application in a separate environment. The environment has an active database and core application servers, but other components may be inactive. In the event of a disaster, the environment can be quickly scaled up to meet the demand. The Warm Standby strategy can provide a low RPO and a relatively fast RTO, but it requires more time to scale up the environment than a Multi-Site strategy.

D. Backup & Restore: The Backup & Restore strategy involves creating regular backups of the application's data and configuration settings. In the event of a disaster, the application can be restored from the most recent backup. This strategy provides a low RPO but has a relatively slow RTO as it requires time to restore the application from the backup.

Considering the requirements of the global investment firm, a Multi-Site strategy would be the most appropriate as it provides a low RPO and a fast RTO. However, if budget constraints are significant, the Pilot Light or Warm Standby strategies can also be considered, but they may not be as fast as Multi-Site in terms of RTO. The Backup & Restore strategy would not be suitable for a critical application that cannot tolerate more than a few minutes of downtime.