Tick-Bank: Kinesis Stream Implementation for Redshift Analytics | AWS Certified Big Data Specialty

Kinesis Stream Implementation for Redshift Analytics

Question

Tick-Bank is a privately held Internet retailer of both physical and digital products founded in 2008

The company has more than six-million clients worldwide.

Tick-Bank aims to serve as a connection between digital content makers and affiliate dealers, who then promote them to clients.

Tick-Bank's technology aids in payments, tax calculations and a variety of customer service tasks.

Tick-Bank assists in building perceptibility and revenue making opportunities for entrepreneurs. Tick-Bank runs multiple java based web applications running on windows based EC2 machines in AWS managed by internal IT Java team, to serve various business functions.

Tick-Bank is looking to enable web-site traffic analytics there by understanding user navigational behavior, preferences and other click related info.

The amount of data captured per click is in tens of bytes.

Tick-Bank has the following objectives in mind for the solution. Tick-Bank uses KPL to process the data and KCL library to consume the records.

Since the amount of data generated by stream is very small, Since the number of clicks are massive, Tick-Bank is planning to use sharding.

The data is loaded into DWH build on Redshift which captures the changes and build analytics every 5 minutes.

Provide the detailed specifications of kinesis stream implementation on how to address the latency of 5 minutes into redshift cost effectively.

Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: A,C,D.

Option A is correct.Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.Batching of records is part of implementation.

The KPL supports two types of batching:

Aggregation - Storing multiple records within a single Kinesis Data Streams record.

Collection - Using the API operation PutRecords to send multiple Kinesis Data Streams records to one or more shards in your Kinesis data stream.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option B is incorrect.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.Batching of records is part of implementation.

The KPL supports two types of batching:

Aggregation - Storing multiple records within a single Kinesis Data Streams record.

Collection - Using the API operation PutRecords to send multiple Kinesis Data Streams records to one or more shards in your Kinesis data stream.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option C is Correct.

Aggregation refers to the storage of multiple records in a Kinesis Data Streams record.

Aggregation allows customers to increase the number of records sent per API call, which effectively increases producer throughput.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option D is correct.

Collection refers to batching multiple Kinesis Data Streams records and sending them in a single HTTP request with a call to the API operation PutRecords, instead of sending each Kinesis Data Streams record in its own HTTP request.

This increases throughput compared to using no collection because it reduces the overhead of making many separate HTTP requests.

In fact, PutRecords itself was specifically designed for this purpose.

Collection differs from aggregation in that it is working with groups of Kinesis Data Streams records.

The Kinesis Data Streams records being collected can still contain multiple records from the user.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Tick-Bank is a company that runs multiple Java-based web applications on Windows-based EC2 machines in AWS managed by an internal IT Java team. The company aims to enable website traffic analytics to understand user navigational behavior, preferences, and other click-related information. The data generated per click is small, but the number of clicks is massive. To process this data, Tick-Bank is using Kinesis Producer Library (KPL) to process the data and Kinesis Client Library (KCL) to consume the records.

The data is then loaded into a data warehouse built on Amazon Redshift, which captures the changes and builds analytics every 5 minutes. To address the latency of 5 minutes cost-effectively, Tick-Bank can implement the following options:

  1. Batching of records: One option is to batch the records and load them into Redshift in bulk at regular intervals instead of processing them in real-time. This approach can reduce the number of Redshift load operations, thereby reducing the Redshift cluster's cost. By batching the records, Tick-Bank can also reduce the amount of data that needs to be processed, thereby improving the overall system's efficiency.

  2. Sequential load: Another option is to load the data sequentially into Redshift. Instead of loading data into Redshift in parallel, the data can be loaded sequentially, which can help in reducing the load on the cluster. By reducing the load on the cluster, Tick-Bank can improve the performance and reduce the cost of the Redshift cluster.

  3. Aggregation: A third option is to aggregate the data before loading it into Redshift. By aggregating the data, Tick-Bank can reduce the number of records that need to be processed, thereby improving the system's efficiency. This approach can also help reduce the load on the Redshift cluster, thereby reducing the cost of the cluster.

Overall, by implementing batching of records, sequential load, and aggregation, Tick-Bank can effectively address the latency of 5 minutes into Redshift cost-effectively.