Limiting Impact on Customers during AWS CloudFormation Update

Minimize Downtime with AWS CloudFormation Update

Question

A SysOps Administrator has been notified of an issue involving an Auto Scaling group with hundreds of instances.

When the launch configuration is updated, the process for the Auto Scaling group is taking multiple nodes offline at the same time, which is impacting customers.

The Application team uses AWS CloudFormation to update the application by changing a parameter to the version of code it wants to enable. What can the Administrator do to limit the impact on customers while the update is being performed?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: B.

The AWS Documentation mentions the following.

To specify how AWS CloudFormation handles replacement updates for an Auto Scaling group, use the AutoScalingReplacingUpdate policy.

This policy enables you to specify whether AWS CloudFormation replaces an Auto Scaling group with a new one or replaces only the instances in the Auto Scaling group.

Options A and C are incorrect since this will still not prevent the changes of Cloudformation.

Option D is incorrect since the Autoscaling already depends on the Launch Configuration.

For more information on the update policy attribute, please visit the below URL-

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

The scenario describes an issue where updating the launch configuration of an Auto Scaling group is taking multiple nodes offline simultaneously, impacting customers. The application team is using CloudFormation to update the application code by changing a parameter to the version they want to enable. The SysOps Administrator needs to limit the impact on customers while the update is being performed. Let's go through the answer options to see what could be done.

Option A suggests editing the user data script to describe the instances in the Auto Scaling group and run AWS EC2 terminate-instances against the next oldest instance ID. This option is not a good solution to the problem. Termination of instances may cause an outage for customers. It is not advisable to rely on terminating instances to manage the Auto Scaling group.

Option B suggests adding the Update Policy attribute in CloudFormation and enabling the WaitOnResourceSignals property. Additionally, a health check should be added at the end of the user data script to signal CloudFormation that it was successful. This option is a good solution to the problem. By adding the Update Policy attribute to the CloudFormation template, AWS will replace instances in the Auto Scaling group gradually instead of all at once, which will minimize the impact on customers. The WaitOnResourceSignals property will hold off the stack update until the instances have passed their health checks, ensuring that the new instances are healthy before being put into service. The health check at the end of the user data script signals CloudFormation that the update was successful.

Option C suggests converting the Auto Scaling group to individual instances and having the application team update one machine at a time in CloudFormation. This option is not practical as it would require significant changes to the infrastructure and would take a lot of time to implement. Additionally, this approach does not offer any benefits over Option B.

Option D suggests adding a DependsOn attribute to the Auto Scaling group resource in CloudFormation, depending on the Launch Configuration. Appending the user data script to signal the wait condition is also suggested. This option is not ideal because it does not address the root cause of the problem. The DependsOn attribute will not prevent multiple nodes from being taken offline at the same time when the launch configuration is updated.

Therefore, the correct answer is option B - Add the Update Policy attribute in CloudFormation, and enable the WaitOnResourceSignals property. Append a health check at the end of the user data script to signal CloudFormation that it was successful. This solution will ensure that the update is performed gradually, minimizing the impact on customers, and will only put healthy instances into service.