You work as a machine learning specialist for a cruise ship company.
Due to new health restrictions, your company needs to only book their cruise ships at 50% capacity across all of their cruise offerings.
To maximize profitability, you have been asked to create a model that gathers streaming data from various data sources such as weather services, census data, gross national product for various countries, spending habits across various countries, etc.
You will use this data to build a model that uses clusters of data to predict cruise allocation.
You need to perform feature engineering, such as feature transformations, on your streaming data and then load it into your company's MongoDB database.
What is the most efficient solution for your scenario?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: A.
Option A is correct.
You can stream your data sources via Kinesis Data Firehose, use a Lambda function that you write to perform feature transformations, then stream the transformed data to an HTTP endpoint for the MongoDB third-party service provider.
Option B is incorrect.
Kinesis Data Analytics cannot write directly to a MongoDB HTTP endpoint.
Option C is incorrect.
Kinesis Data Analytics cannot write directly to a MongoDB HTTP endpoint.
Option D is incorrect.
This option will be much less efficient than option A because the Glue ETL job will have to write to S3, then you would have to write a script to load the data into MongoDB.References:
Please see the Amazon Kinesis Data Firehose developer guide titled What Is Amazon Kinesis Data Firehose? (https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html),
The Amazon Kinesis Data Firehose developer guide titled Amazon Kinesis Data Firehose Data Transformation (https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html),
The Amazon Kinesis Data Analytics developer guide titled Kinesis Data Analytics for Apache Flink: How It Works (https://docs.aws.amazon.com/kinesisanalytics/latest/java/how-it-works.html),
The Amazon Kinesis Data Firehose developer guide titled Using Amazon Kinesis Data Analytics (https://docs.aws.amazon.com/firehose/latest/dev/data-analysis.html)
In this scenario, the cruise ship company needs to maximize their profitability by creating a machine learning model that gathers streaming data from various data sources such as weather services, census data, gross national product for various countries, spending habits across various countries, etc., to predict cruise allocation while adhering to the new health restrictions that limit the capacity to 50%.
To efficiently achieve this, we need to stream the data from various sources, transform the features, and load the transformed data into a MongoDB database.
Option A: Stream your data sources via Kinesis Data Firehose to your MongoDB database, using a Lambda function to perform feature transformations.
Kinesis Data Firehose is a fully managed service that can capture, transform, and load streaming data into data stores and analytics tools. With Kinesis Data Firehose, we can easily transform and load data into a MongoDB database.
Lambda is a serverless computing service that enables running code without the need to manage servers. Using Lambda, we can transform the features of the streaming data.
However, this option has a few drawbacks. One of the primary concerns is that Lambda has a limit of 15 minutes for the runtime, which could limit the feature transformation if the processing time exceeds this duration.
Option B: Stream your data sources via Kinesis Data Streams to your MongoDB database, using Kinesis Data Analytics to perform feature transformations.
Kinesis Data Streams is a managed service that enables streaming data at scale. Kinesis Data Analytics allows us to perform real-time data analytics and build machine learning models. We can use the SQL language to process and analyze the data.
However, this option has a few disadvantages. Kinesis Data Streams requires more configuration, and we would need to set up and maintain a cluster of servers to process the streaming data.
Option C: Stream your data sources via Kinesis Data Analytics (using Apache Flink to perform feature transformations) to your MongoDB database.
Kinesis Data Analytics provides an integrated environment for analyzing streaming data with Apache Flink, which is a popular open-source stream processing framework. With this option, we can easily perform feature transformations and load the data into a MongoDB database.
Option D: Stream your data sources via Kinesis Data Firehose to your MongoDB database, using a Glue ETL job to perform feature transformations.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to move data between data stores. With this option, we can stream the data via Kinesis Data Firehose, and use a Glue ETL job to transform the features and load the data into a MongoDB database.
However, this option requires more configuration and may not be the most cost-efficient.
Based on the requirements of the cruise ship company, the most efficient solution is option C, which streams the data via Kinesis Data Analytics (using Apache Flink to perform feature transformations) to the MongoDB database. This option provides an integrated environment for analyzing streaming data and performing feature transformations, which simplifies the process and reduces the maintenance overhead.