You work for a healthcare data provider company that gathers real-time streaming data from healthcare plan participants who have agreed to allow their insurance company to use their health data gathered by their wearable technology, such as internet-connected watches and step counters.
The plan participants receive discounts on their healthcare plan fees when participating in the data streaming effort.
You are on the machine learning team that will use this data to better predict healthcare issues based on the gathered wearable data.
Due to the secure nature of this personal information, you need to build encryption into your data pipeline for this effort. How would you construct your data pipeline in the most secure way to ensure your data is encrypted as it moves from the IoT wearable devices to your machine learning data source?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: C.
Option A is incorrect.
IoT Analytics is used to filter, transform, and enrich IoT data before storing the data in a time-series data store for analysis.
IoT Analytics doesn't encrypt your data.
Option B is incorrect.
Using Kinesis Data Streams to gather your IoT data and be the source for a Kinesis Data Firehose delivery stream is the correct choice.
However, you would leverage Kinesis Data Streams to encrypt your data using an AWS Key Management Service (AWS KMS) key before storing the data at rest, not Kinesis Data Firehose.
When you use a Kinesis data stream as the source of a Kinesis Data Firehose delivery stream, Kinesis Data Firehose does not store the data at rest.
The data is stored at rest in the Kinesis Data Stream.
Option C is correct.
You use Kinesis Data Streams to gather your IoT data and be the source for a Kinesis Data Firehose delivery stream.
You also leverage Kinesis Data Streams to encrypt your data using an AWS Key Management Service (AWS KMS) key before storing the data at rest.
Then Kinesis Data Streams is used as the source of your Kinesis Data Firehose delivery stream, which delivers the data to your S3 bucket used for your machine learning models.
Option D is incorrect.
You would have to use Kinesis Data Streams together with Kinesis Data Analytics to get the encryption needed for your solution.
Reference:
Please see the Amazon Kinesis Data Firehose developer guide titled Data Protection in Amazon Kinesis Data Firehose, the Amazon Kinesis Data Analytics overview page, the AWS IoT Analytics overview page, the AWS IoT Analytics user guide titled What Is AWS IoT Analytics, and the Amazon Kinesis Data Analytics for SQL Applications developers guide titled Data Protection in Amazon Kinesis Data Analytics for SQL Applications.
The most secure way to construct a data pipeline to ensure that the data is encrypted as it moves from the IoT wearable devices to the machine learning data source is by using a combination of Kinesis Data Streams, Kinesis Data Firehose, and AWS Key Management Service (AWS KMS).
Option B is the correct answer. Here is a detailed explanation of why it is the best approach:
Kinesis Data Streams is a real-time data streaming service that can continuously capture and store gigabytes of data per second from hundreds of thousands of sources such as IoT devices. It allows you to ingest, process, and analyze streaming data in real-time, making it an excellent fit for capturing data from IoT wearable devices.
Kinesis Data Firehose is a service that can load data from Kinesis Data Streams into data stores such as Amazon S3, Redshift, or Elasticsearch. Kinesis Data Firehose can also transform and encrypt data before it is stored in the data store.
AWS KMS is a managed service that makes it easy to create and control the encryption keys used to encrypt your data. With AWS KMS, you can create and manage keys in the AWS Management Console or by using API calls.
By using Kinesis Data Streams to gather the streaming data from the IoT devices, you can encrypt your data using an AWS KMS key before storing the data at rest. You can then have Kinesis Data Streams be the source of a Kinesis Data Firehose delivery stream which encrypts your data using an AWS KMS key before storing the data at rest in S3.
Once the data is in S3, you can use it to train your machine learning models while ensuring the data remains secure and encrypted.
Therefore, Option B is the best approach as it allows the data to be encrypted both in transit and at rest, which is critical for protecting sensitive personal health data.