Amazon Exam: DOP-C01 - AWS Certified DevOps Engineer - Professional

Ad-Hoc Business Analytics Queries on Petabytes of Data: AWS Services for High-Velocity Data Analysis

Prev Question Next Question

Question

You need to perform ad-hoc business analytics queries on well-structured, petabytes of data.

Data comes in constantly at a high velocity.

Your business intelligence team knows how to use SQL to query data and perform analysis.

What AWS service(s) should you use?

Answers

A. Kinesis Firehose + RDS

B. Kinesis Firehose + RedShift

C. EMR using Hive

D. EMR running Apache Spark.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - B.

Amazon Kinesis Firehose is the easiest way to load streaming data into AWS.

It can capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service, enabling near real-time analytics with existing business intelligence tools and dashboards you're already using today.

It is a fully managed service that automatically scales to match your data's throughput and requires no ongoing administration.

It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.

For more information on Kinesis firehose, please visit the below URL:

https://aws.amazon.com/kinesis/firehose/

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.

You can start with just a few hundred gigabytes of data and scale to a petabyte or more.

This enables you to use your data to acquire new insights for your business and customers.

For more information on Redshift, please visit the below URL:

http://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html

Option A is INCORRECT because the database needs to scale to petabytes of data.

Redshift is more suitable for this requirement.

Options C and D are INCORRECT because the 'data' is input in a 'high velocity', and the suitable option here would be a 'Kinesis Firehose' and not EMR.

For performing ad-hoc business analytics queries on well-structured, petabytes of data that comes in constantly at a high velocity, the best AWS service(s) to use are Kinesis Firehose and Redshift.

Kinesis Firehose is a fully managed, scalable service for ingesting real-time streaming data into AWS. It can handle streaming data at high volumes and enables real-time processing of data by delivering it to various AWS services, including Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service.

Redshift is a fully managed, petabyte-scale data warehouse service in AWS. It is designed to handle large-scale, complex data analytics workloads, making it ideal for ad-hoc business analytics queries on petabytes of structured data. It is optimized for querying and aggregating large datasets using SQL, making it an excellent choice for business intelligence teams that know how to use SQL to query data and perform analysis.

Therefore, the correct answer is B. Kinesis Firehose + RedShift.

Option A, Kinesis Firehose + RDS, is not the best choice because RDS is a relational database service designed for transactional workloads and is not optimized for ad-hoc analytics queries on large volumes of data.

Option C, EMR using Hive, is also not the best choice because EMR with Hive is more suitable for processing unstructured data using HiveQL, which is a SQL-like language used for querying data stored in Hadoop Distributed File System (HDFS).

Option D, EMR running Apache Spark, is a suitable choice for ad-hoc business analytics queries on large volumes of data. However, compared to Redshift, Spark can be more complex to set up and manage. Additionally, Spark is optimized for processing unstructured data, while Redshift is optimized for querying and aggregating large datasets using SQL. Therefore, Redshift is the better choice for structured data.

Prev Question Next Question