Processing Click Stream Data with AWS Kinesis for HikeHills.com

Apply Transformation, Store in Buckets, and Backup Click Stream Data

Prev Question Next Question

Question

HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HHruns their entire online infrastructure on java based web applications running on AWS.

The HH is capturing click stream data and use custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS kinesis KPL to collect events and transaction logs and process the stream.

The syslog size is around 12 bytes. HHhas the following requirements to process the data that is being ingested - Apply transformation of syslog data to JSON and CSV format and store it into different buckets to support different processing needs Capture transformation failures into same S3 bucket to address audit Backup the syslog streaming data into S3 bucket How can this be achieved? select 3 options.

Answers

A. Data transformation from syslog to JSON and CSV can be performed through Lambda blueprints

B. Data transformation from syslog to JSON is performed through Lambda, but transformation to CSV is performed implicitly by Kinesis Firehose

C. Data transformation from syslog to CSV is performed through Lambda, but transformation to JSON is performed implicitly by Kinesis Firehose

D. when S3 is selected as destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket

E. S3 backups can be managed to bucket policies

F. Data Transformation failures are delivered to processing-failed folder

G. Data Transformation failures are delivered to transform-failed folder.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F. G.

Answer: A, D, F.

Option A is correct -Kinesis Data Firehose provides Lambda blueprints that you can use to create a Lambda function for datatransformation

https://docs.aws.amazon.com/firehose/latest/dev/data-

Option B is incorrect - Kinesis Data Firehose provides Lambda blueprints that you can use to create a Lambda function for data transformation

https://docs.aws.amazon.com/firehose/latest/dev/data-

Option C is incorrect -Kinesis Data Firehose provides Lambda blueprints that you can use to create a Lambda function for data transformation

https://docs.aws.amazon.com/firehose/latest/dev/data-

Option D is correct -when S3 is selected as destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket

https://docs.aws.amazon.com/firehose/latest/dev/create-

Option E is incorrect -when S3 is selected as destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket

https://docs.aws.amazon.com/firehose/latest/dev/create-

Option F is correct - If data transformation fails, the unsuccessfully processed records are delivered to your S3 bucket in the processing-failed folder

https://docs.aws.amazon.com/firehose/latest/dev/data-

Option G is incorrect -If data transformation fails, the unsuccessfully processed records are delivered to your S3 bucket in the processing-failed folder

https://docs.aws.amazon.com/firehose/latest/dev/data-

The question requires selecting three options that explain how to achieve the following requirements for processing data that is being ingested:

Apply transformation of syslog data to JSON and CSV format and store it into different buckets to support different processing needs
Capture transformation failures into the same S3 bucket to address audit
Backup the syslog streaming data into an S3 bucket

Let's review the options provided:

A. Data transformation from syslog to JSON and CSV can be performed through Lambda blueprints

Lambda is a compute service offered by AWS that allows the execution of code in response to events. Lambda blueprints are pre-built code templates that can be customized to create a function that can transform data.

Option A is correct, as Lambda can be used to transform the syslog data into JSON and CSV formats. Lambda code can be configured to read from Kinesis data streams, apply transformations, and then write the transformed data to different S3 buckets.

B. Data transformation from syslog to JSON is performed through Lambda, but transformation to CSV is performed implicitly by Kinesis Firehose

Kinesis Firehose is a fully managed service that allows the delivery of real-time streaming data to destinations such as S3, Elasticsearch, or Redshift. It can also perform transformations on data using pre-built or custom-built data transformation functions.

Option B is incorrect, as Kinesis Firehose can transform data into either JSON or CSV format, but it cannot implicitly transform the data into both formats simultaneously. Therefore, the statement that transformation to CSV is performed implicitly by Kinesis Firehose is incorrect.

C. Data transformation from syslog to CSV is performed through Lambda, but transformation to JSON is performed implicitly by Kinesis Firehose

Option C is incorrect, as Kinesis Firehose can transform data into either JSON or CSV format, but it cannot implicitly transform the data into both formats simultaneously. Therefore, the statement that transformation to JSON is performed implicitly by Kinesis Firehose is incorrect.

D. When S3 is selected as the destination, and Source record S3 backup is enabled, untransformed incoming data can be delivered to a separate S3 bucket

S3 is an object storage service provided by AWS that allows the storage and retrieval of data from anywhere on the web. When data is ingested into S3, it can be backed up automatically to another S3 bucket using the Source record S3 backup feature. This feature creates a backup of the original data in case of data loss or corruption.

Option D is correct, as the untransformed incoming data can be delivered to a separate S3 bucket using the Source record S3 backup feature. This allows the original syslog data to be stored separately from the transformed data.

E. S3 backups can be managed with bucket policies

S3 bucket policies are used to manage access to S3 buckets and their objects. They are used to control who can access the data and what actions can be performed on it.

Option E is incorrect, as S3 bucket policies are not used to manage backups of data. They are used to control access to S3 buckets and objects.

F. Data transformation failures are delivered to processing-failed folder

Option F is incorrect, as there is no standard processing-failed folder for storing data transformation failures in AWS. The correct way to store transformation failures is to configure the Lambda function to write the failed records to a separate S3 bucket.

G. Data transformation failures are delivered to transform-failed folder.

Option G is correct, as it is a common practice to configure the Lambda function to write the failed records to a separate S3 bucket, typically named as the transform-failed folder. By storing failed records separately, it becomes easier to analyze and debug the failed transformations.

In conclusion, the three correct options are A, D,

Prev Question Next Question