Data Lake for Hospital Heart Rate Monitoring: AWS Certified Big Data - Specialty Exam Question

Build a Scalable Data Lake for Hospital Heart Rate Monitoring

Question

A company needs to develop a system for a hospital.

The application needs to ingest the heart rate recorded for various patients.

The requirements for the application are: A data lake that can expand on demand to store the heart rate information. A way to ingest the data and store it in the data lake. A way to catalogue the information Which of the following would you use for this requirement?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - C.

An example of this architecture is given in the AWS Documentation.

#######

How to build a front-line concussion monitoring system using AWS IoT and serverless data lakes - Part 2

In part 1 of this series, we demonstrated how to build a data pipeline in support of a data lake.

We used key AWS services such as Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda.

In part 2, we discuss how to process and visualize the data by creating a serverless data lake that uses key analytics to create actionable data.

Create a serverless data lake and explore data using AWS Glue, Amazon Athena, and Amazon QuickSight.

As we discussed in part 1, you can store heart rate data in an Amazon S3 bucket using Kinesis Data Streams.

However, storing data in a repository is not enough.

You also need to be able to catalog and store the associated metadata related to your repository so that you can extract the meaningful pieces for analytics.

For a serverless data lake, you can use AWS Glue, which is a fully managed data catalog and ETL (extract, transform, and load) service.

AWS Glue simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling.

As you get your AWS Glue Data Catalog data partitioned and compressed for optimal performance, you can use Amazon Athena for the direct query to S3 data.

You can then visualize the data using Amazon QuickSight.

The following diagram depicts the data lake that is created in this demonstration:

#######

Options A and B are incorrect since SQS would not be the ideal service to ingest the information.

Option D is incorrect since AWS Athena is a querying tool.

For more information on this use case, please visit the url.

https://aws.amazon.com/blogs/big-data/how-to-build-a-front-line-concussion-monitoring-system-using-aws-iot-and-serverless-data-lakes-part-2/
The following diagram depicts the data lake that is created in this demonstration:

$e hw a

Kinesis Data ‘Amazon QuickSight
st ‘Amazon $3 AWS Glue ETLto Amazon $3 bucket Amazon Athena To vieualze the data
Hear bucket process and storing processed fordirectdata and create HeartRate

storing raw data transform data Data query 10 S3 Dashboard

The best option for this requirement is option B: AWS S3 as the data lake, AWS SQS to ingest data and AWS Glue to catalogue the information.

Here is why:

  1. Data lake: A data lake is a centralized repository that allows you to store all structured and unstructured data at any scale. It provides a cost-effective solution for storing large amounts of data. Amazon S3 is an object storage service that provides a scalable, secure, and highly available solution for storing data. It is a perfect choice for storing the heart rate information for various patients.

  2. Data ingestion: AWS SQS (Simple Queue Service) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. It allows you to send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available. In this case, it can be used to ingest heart rate data from different sources and queue them up for processing.

  3. Data Catalog: AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to move data between data stores. It also provides a data catalog that acts as a metadata repository for all data assets stored in S3. The catalog can be used to discover, search, and query data, and it also allows users to create and manage custom metadata. This makes AWS Glue the perfect choice for cataloging the heart rate information stored in S3.

Option A is not the best choice because AWS Redshift is a data warehouse, which is not the best fit for storing large amounts of unstructured data like heart rate information. Also, AWS Glue is a better fit for cataloging data stored in S3 rather than Redshift.

Option C is not the best choice because AWS Kinesis is a streaming data service, which is more suitable for real-time processing of data rather than batch processing. Since the requirements don't mention real-time processing, AWS Kinesis is not the best choice.

Option D is not the best choice because AWS Athena is a query service that works directly with structured data stored in S3. Heart rate data is typically unstructured, so it would need to be transformed before it can be queried with Athena. AWS Glue, on the other hand, can handle the transformation process automatically and then catalog the data in the AWS Glue Data Catalog.

Therefore, the best option for this requirement is option B: AWS S3 as the data lake, AWS SQS to ingest data and AWS Glue to catalogue the information.