PayMe Streaming Platform: Addressing Critical Issues | BDS-C00 Exam Answer

How to Address Critical Issues with PayMe Streaming Platform

Question

PayMe runs OneTap, a market leading payment processing solution working with over 40 payment methods providing best conversion rates, connects to any website, application, or related third-party system.

PayMe uses multi-shard stream to ingest data (5800-7200 user records/sec, payload varies between 1KB-1.2 KB), later process into downstream applications.

KPL already batch multiple stream records to send them in a single HTTP request, and end to end monitoring is enabled for streams.

Using KPL Metrics, PayMe team observed the following critical issues TCO for maintaining the streaming platform is too high, even after implementing batching Missing records, thereby concerns around durability of the platform Performance of the streaming platform does not meet SLA's Which of the following options would address the above issues?Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer : B,C andD.

Option A is incorrect - Collection is already implemented as a part of the implementation of aggregation.

Collection reduces the overhead of making many separate HTTP requests for a multi-shard stream.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.

Collection refers to batching multiple Kinesis Data Streams records and sending them in a single HTTP request with a call to the API operation PutRecords, instead of sending each Kinesis Data Streams record in its own HTTP request.

This increases throughput compared to using no collection because it reduces the overhead of making many separate HTTP requests.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option B is correct -Aggregation helps to improve the per shard throughput.

This is also optimizes the overall TCO of the stream.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.

Aggregation refers to the storage of multiple records in a Kinesis Data Streams record.

Aggregation allows customers to increase the number of records sent per API call, which effectively increases producer throughput.

Kinesis Data Streams shards support up to 1,000 Kinesis Data Streams records per second, or 1 MB throughput.

The Kinesis Data Streams records per second limit binds customers with records smaller than 1 KB.

Record aggregation allows customers to combine multiple records into a single Kinesis Data Streams record.

This allows customers to improve their per shard throughput.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option C is correct -

When Kinesis Producer Library (KPL) user records are added to a stream; a record is given a time stamp and added to a buffer with a deadline set by the RecordMaxBufferedTime configuration parameter.

This time stamp/deadline combination sets the buffer priority.

Records are flushed from the buffer based on the following criteria:

Buffer priority.

Aggregation configuration.

Collection configuration.

Records flushed are then sent to your Kinesis data stream as Amazon Kinesis Data Streams records.

The PutRecords operation sends requests to your stream that occasionally exhibit full or partial failures.

Records that fail are automatically added back to the KPL buffer.

The new deadline is set based on the minimum of these two values:

Half the current RecordMaxBufferedTime configuration.

The record's time-to-live value.

This strategy allows retried KPL user records to be included in subsequent Kinesis Data Streams API calls, to improve throughput and reduce complexity while enforcing the Kinesis Data Streams record's time-to-live value.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-retries-rate-limiting.html

Option D is correct - Time to live address the durability of the stream.

Records that do not get successfully put within the limit are failed.This setting is useful if your application cannot or does not wish to tolerate late records.

Records will still incur network latency after they leave the KPL, so take that into consideration when choosing a value forthis setting.

If you do not wish to lose records and prefer to retry indefinitely, setrecord_ttl to a large value like INT_MAX.

This has the potential to cause head-of-line blocking if network issues or throttling occur.

You can respond to such situations by using the metrics reporting functions of the KPL.

You may also set fail_if_throttled to true to prevent automatic Retries in case of throttling.

https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties

Option E is incorrect - Rate Limiting helps with Spamming.

The KPL includes a rate limiting feature, which limits per-shard throughput sent from a single producer.

Rate limiting is implemented using a token bucket algorithm with separate buckets for both Kinesis Data Streams records and bytes.

Each successful write to a Kinesis data stream adds a token (or multiple tokens) to each bucket, up to a certain threshold.

This threshold is configurable but by default is set 50 percent higher than the actual shard limit, to allow shard saturation from a single producer.

You can lower this limit to reduce spamming due to excessive retries.

However, the best practice is for each producer to retry for maximum throughput aggressively and to handle any resulting throttling determined as excessive by expanding the capacity of the stream and implementing an appropriate partition key strategy.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-retries-rate-limiting.html

PayMe runs OneTap, a payment processing solution that uses a multi-shard stream to ingest data. However, the PayMe team has identified critical issues with their streaming platform, including high total cost of ownership (TCO), missing records, and performance issues that do not meet service level agreements (SLAs).

To address these issues, the team should consider the following options:

  1. Collection - This option involves collecting all the necessary data at once and then sending it in a single HTTP request. This approach can help reduce the overall cost of maintaining the streaming platform by reducing the number of requests sent to downstream applications. Additionally, it can help ensure the durability of the platform by ensuring that all records are collected and sent in one go, reducing the chance of missing records.

  2. Aggregation - This option involves grouping multiple records together before sending them to downstream applications. By aggregating records, the team can reduce the number of requests sent to downstream applications, which can improve performance and reduce TCO.

  3. KPL Retry Mechanism - This option involves using the KPL retry mechanism to ensure that records are not lost during the ingestion process. By enabling the KPL retry mechanism, the team can ensure that records are re-sent in the event of an error, reducing the chance of missing records and improving the overall durability of the platform.

  4. Time to Live - This option involves setting a time-to-live (TTL) for records to ensure that they are not stored indefinitely. By setting a TTL, the team can ensure that records are automatically deleted after a specified amount of time, reducing the storage cost and improving overall TCO.

  5. Rate Limiting - This option involves limiting the rate at which records are sent to downstream applications. By limiting the rate, the team can improve the overall performance of the streaming platform and ensure that SLAs are met.

Overall, to address the critical issues identified by the PayMe team, they should consider implementing a combination of these options, including collection, aggregation, the KPL retry mechanism, time to live, and rate limiting.