Enable Web-Site Traffic Analytics and Log File Standardization for Tick-Bank | AWS Certified Big Data - Specialty Exam

Tick-Bank's Web-Site Traffic Analytics and Log File Standardization

Question

Tick-Bank is a privately held Internet retailer of both physical and digital products founded in 2008

The company has more than six-million clients worldwide.

Tick-Bank aims to serve as a connection between digital content makers and affiliate dealers, who then promote them to clients. Tick-Bank runs more than 40 java based web applications running on windows based EC2 machines in AWS managed by internal IT Java team, to serve various business functions.

Tick-Bank is looking to enable web-site traffic analytics there by understanding user navigational behavior, preferences and other click related info.

Tick-Bank is also looking at improving operations ingesting monitoring logs. Tick-Bank understands that the amount of logs captured by kinesis Agent and processed into streams everyday are around 10's of GB of data.

with more web applications becoming part of traffic analytics, Tick-Bank wants to reduce overall storage costs of these log files and want to standardize the data formats of log files into Apache ORC before storing it into S3.How can this be achieved?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Kinesis Data Streams cannot apply data transformation and conversion using Record format techniques.Firehose can effectively perform Data Transformation using lambda blueprints from Log to JSON formats and then apply record format conversion techniques to convert to binary formats like Parquet and ORC.

https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

The solution to achieve the goal of standardizing log data formats and reducing storage costs for Tick-Bank can be achieved using Kinesis Firehose.

Kinesis Firehose is a fully managed service provided by AWS for ingesting streaming data in real-time and processing it. It can be used to transform and load data into destinations like S3, Redshift, Elasticsearch, and Splunk.

In this case, Kinesis Firehose can be used to transform and load the log data into S3 after standardizing the data formats to Apache ORC. ORC is a highly optimized columnar storage format, designed for big data workloads, and can reduce storage costs by compressing and storing data in a highly efficient manner.

Kinesis Firehose provides two types of data transformation techniques: Data Transformation and Record Format Techniques.

Data Transformation is a serverless service provided by AWS Lambda that can be used to transform data before it is loaded into the destination. It allows you to write custom code to modify, filter or enrich the data in real-time using languages like Python, Node.js, Java, etc.

Record Format Techniques, on the other hand, allow you to transform the data in a pre-defined format before loading it into the destination. Firehose supports several pre-defined record formats like CSV, JSON, Parquet, and ORC. You can use these formats to standardize the data format and schema.

Therefore, the correct answer is C. Kinesis Firehose can transform using Data Transformation and Record Format techniques. Using Kinesis Firehose, Tick-Bank can transform the log data into Apache ORC format using pre-defined record format techniques and reduce storage costs by compressing and storing the data efficiently. Additionally, they can use Data Transformation to enrich the data before loading it into S3.