You work for a flight diagnostics company that builds instrumentation for airline manufacturers.
Your company's instrumentation hardware and software are used to detect flight pattern information such as flight path deviation and airline component malfunction.
Your team of machine learning specialists has created a model using the Random Cut Forest algorithm to be used to identify anomalies in the data.
The streaming data that your instrumentation processes need to be cleaned and transformed via feature engineering before passing it to your inference endpoint.
You have created the pre-processing and post-processing steps (for cleaning and feature engineering) in your training process. How can you implement the cleaning and feature engineering steps in your inference processing in the most efficient manner?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: C.
Option A is incorrect.
Although you could execute your pre-processing steps in a client application before sending the data to your inference end-point, this would require additional work on your part to build that client application and then incorporate your feature engineering scripts from your training process into it.
Option B is incorrect.
You could also include your pre-processing steps in your inference container.
However, this requires more work on your part than using the SageMaker Inference Pipelines feature.
Option C is correct.
SageMaker Inference Pipelines allows you to bundle and export your pre and post-processing steps from your training process and deploy them as part of your Inference Pipeline.
AWS fully manages inference Pipelines.
Option D is incorrect.
Amazon IoT Core is used to facilitate device intercommunication.
It is not a service you would use for pre-processing data streams for machine learning inference endpoints.
Reference:
Please see the Amazon announcement titled Announcing Enhancements for Data Processing and Feature Engineering, and Improved Framework Support with Amazon SageMaker, the Amazon SageMaker developer guide titled Deploy an Inference Pipeline, the AWS Machine Learning blog titled Use the built-in Amazon SageMaker Random Cut Forest algorithm for anomaly detection, and the AWS IoT Core Overview page.
As a flight diagnostics company that builds instrumentation for airline manufacturers, your team of machine learning specialists has created a Random Cut Forest model to identify anomalies in the data. However, before the streaming data can be passed to the inference endpoint, it needs to be cleaned and transformed through feature engineering.
To implement the cleaning and feature engineering steps in the most efficient manner, there are several options to consider:
A. Execute the pre-processing in a client application before sending the data to your inference endpoint. This option involves performing the pre-processing steps in a client application before sending the data to the inference endpoint. While this approach could work, it may not be the most efficient solution since it adds additional processing steps to the client application. Moreover, it may not be possible to perform all the required pre-processing steps in a client application, especially if the data is complex or requires large amounts of computational power.
B. Bundle and export the training pre-processing steps and deploy them to your inference container. This option involves bundling and exporting the pre-processing steps that were used during the training process and deploying them to the inference container. This approach can be an efficient solution, as it ensures consistency between the training and inference environments. However, it may not be the most scalable option since it requires deploying the pre-processing steps to each inference container.
C. Bundle and export the training pre-processing steps and deploy them as part of your Inference Pipeline. This option involves bundling and exporting the pre-processing steps that were used during the training process and deploying them as part of your Inference Pipeline. This approach can be an efficient and scalable solution, as it allows you to perform the pre-processing steps on the data before passing it to the inference endpoint. This method ensures consistency between the training and inference environments and provides flexibility in updating and maintaining the pre-processing steps.
D. Bundle and export the training pre-processing steps and deploy them to IoT Core on the data emitting devices. This option involves bundling and exporting the pre-processing steps that were used during the training process and deploying them to IoT Core on the data emitting devices. This approach can be an efficient solution since it allows you to perform the pre-processing steps on the data close to the source, reducing the amount of data that needs to be transmitted to the inference endpoint. However, it may not be the most scalable solution, especially if you have a large number of data emitting devices that require the pre-processing steps to be deployed to each device.
In conclusion, option C, bundling and exporting the training pre-processing steps and deploying them as part of your Inference Pipeline, may be the most efficient and scalable solution for implementing the cleaning and feature engineering steps in your inference processing. It provides consistency between the training and inference environments, flexibility in updating and maintaining the pre-processing steps, and allows you to perform the pre-processing steps on the data before passing it to the inference endpoint.