You work as a machine learning specialist for a research data streaming service that serves research reference content to subscribers.
Your company's subscriber base is primarily made up of university research staff.
However, your company occasionally produces research content that has broader appeal, and your service gets very big spikes in requests for streaming traffic.
Your machine learning team has a critical component in the content delivery process.
You have a recommendation engine model variant that processes inference requests for every content streaming request.
When your model variant receives these spikes in inference requests, your company's streaming service suffers poor performance.
You have decided to use SageMaker autoscaling to meet the varying demand for your model variant inference requests.
Which type of scaling policy should you use in your SageMaker autoscaling implementation?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: A.
Option A is correct.
AWS recommends that you use scaling policies for your autoscaling configuration because it is fully automated.
Option B is incorrect.
AWS recommends that you use step scaling when you need an advanced configuration, such as specifying how many instances to deploy under certain circumstances.
You don't have a specialized need like this, so you should use target-tracking scaling.
Option C is incorrect.
SageMaker autoscaling doesn't have a simple scaling policy.
Option D is incorrect.
SageMaker autoscaling doesn't have a scheduled scaling policy.
References:
Please see the Amazon SageMaker developer guide titled Automatically Scale Amazon SageMaker Models (https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html),
The Amazon SageMaker developer guide titled Prerequisites (https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-prerequisites.html)
In this scenario, the machine learning team has a recommendation engine model that processes inference requests for every content streaming request. During spikes in demand for the streaming service, the model receives a large number of inference requests, causing the streaming service to suffer from poor performance. To address this issue, the team has decided to use SageMaker autoscaling to meet the varying demand for the model variant inference requests.
SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning models quickly and easily. SageMaker autoscaling can be used to automatically adjust the number of instances running the model variant based on demand.
There are several types of scaling policies available in SageMaker autoscaling, including:
A. Target-tracking scaling: This type of scaling policy allows you to set a target value for a specific metric, such as CPU utilization or request count per instance. The policy adjusts the number of instances running the model variant to maintain the target value.
B. Step scaling: This type of scaling policy allows you to define a set of scaling adjustments based on the value of a specific metric, such as the number of requests per instance. The policy applies the scaling adjustment when the metric crosses a threshold value.
C. Simple scaling: This type of scaling policy allows you to adjust the number of instances running the model variant based on a scaling adjustment, such as adding or removing a fixed number of instances.
D. Scheduled scaling: This type of scaling policy allows you to set a schedule for scaling activities, such as adding or removing instances at specific times of the day or week.
In this scenario, the best option is likely to be target-tracking scaling. This is because target-tracking scaling allows you to set a target value for a specific metric, such as request count per instance, and the policy will adjust the number of instances running the model variant to maintain the target value. This means that as demand for the streaming service spikes, the number of instances running the model variant will automatically increase to meet the increased demand, and as demand decreases, the number of instances will automatically decrease to save costs.
Step scaling may not be the best option in this scenario because it requires the definition of specific threshold values for scaling adjustments, which may not be predictable during spikes in demand. Simple scaling may also not be the best option because it only allows for fixed scaling adjustments, which may not be able to handle the dynamic nature of spikes in demand. Scheduled scaling may also not be the best option because it does not respond to changes in demand in real-time.