Setting up a data transformation and ingestion pipeline for IoT data with legacy mainframe system integration

Transforming and ingesting IoT data with Amazon Services

Question

You work for a telecommunications service and internet provider company that has been in business for decades.

Over the decades, the company has built various types of application systems and database technologies on the evolving platforms of the time.

Therefore, you have massive amounts of customer and company operational data on legacy mainframe systems and their associated data stores, such as aging relational databases. Your team is attempting to build a machine learning model to use streaming data from the company's in-home routers, functioning as IoT (Internet of Things) devices, and use that data to help the company sell additional services to its customer base.

The IoT data is unstructured, so you need to transform it to CSV format before ingesting it into the S3 buckets you use to house your datasets for your SageMaker model.

You also need to enrich the IoT data with real-time data from your legacy mainframe systems as the data streams into your AWS cloud environment. Which set of Amazon services would you use to set up this data transformation and ingestion pipeline?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Option A is incorrect.

You can't enrich your IoT data with your mainframe data without first getting your mainframe data into your AWS cloud environment.

Option B is incorrect.

You can't write directly from your mainframe systems to S3

You could use AWS Storage Gateway to get your mainframe data into your AWS cloud environment, but AWS Storage Gateway doesn't have the capability to enrich your IoT data.

Option C is correct.

You can use AWS Storage Gateway using the File Gateway configuration via an NFS (Network File System) connection to move your data from your legacy mainframe systems into your AWS cloud environment.

You can then use Kinesis Data Firehose lambda integration to enrich the IoT data with your legacy mainframe systems data and convert it to CSV.

Finally, you can have your lambda function write your transformed data to your S3 bucket used by your SageMaker model.

Option D is incorrect.

AWS Snowball moves data from your on-premises environment to your AWS cloud environment in a one-time batch.

This wouldn't work since you need real-time integration of your legacy data with your IoT data.

Reference:

Please see the AWS whitepaper titled Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility.

The correct answer is A. Use Kinesis Data Firehose to receive the streaming data from the IoT devices. Use the Kinesis Data Firehose lambda integration capability to enrich the IoT data with your legacy mainframe systems data and transform it to CSV before writing it to the S3 bucket used by your SageMaker model.

Explanation: The given scenario requires an architecture to collect data from IoT devices, enrich it with real-time data from legacy mainframe systems, transform the data into CSV format, and store it in S3 buckets for further analysis using SageMaker. The solution needs to be scalable, cost-effective, and easy to manage. Here, we can use the following AWS services:

  1. Kinesis Data Firehose: It is a fully managed service that enables real-time data delivery and makes it easy to load streaming data into S3, Redshift, and Elasticsearch.

  2. AWS Lambda: A serverless compute service that runs code in response to events and automatically manages the compute resources required by that code.

  3. S3: A highly scalable and durable object storage service that can store and retrieve any amount of data from anywhere.

Using these services, we can set up the following pipeline to solve the problem statement:

  1. Collect data from IoT devices using Kinesis Data Firehose.

  2. Use AWS Lambda to enrich the IoT data with real-time data from legacy mainframe systems.

  3. Convert the enriched data into CSV format using AWS Lambda.

  4. Store the transformed data into S3 buckets used by SageMaker.

Option A suggests using Kinesis Data Firehose to collect data from IoT devices and then use its integration with AWS Lambda to enrich and transform data before writing it to S3. This option fulfills all the requirements and is the most cost-effective and scalable solution.

Option B suggests using AWS Storage Gateway to enrich the IoT data with legacy mainframe data and transform it into CSV before writing it to S3. However, it does not leverage Kinesis Data Firehose for real-time data ingestion and transformation.

Option C suggests using both AWS Storage Gateway and Kinesis Data Firehose. However, using Storage Gateway adds complexity and cost to the solution without any significant benefits.

Option D suggests using AWS Snowball to migrate legacy mainframe data to AWS and then use Kinesis Data Firehose to collect IoT data. However, using Snowball is an expensive and time-consuming process and does not provide any benefits over other options.