Capture and Ingest Log Analytics for Real-time Analysis - TruFin Case Study

Capture and Ingest Log Analytics for Real-time Analysis

Question

TruFin is a financial services start-up running more than 50 web applications to support various segments of the business which includes loans, investments, deposits, fund transfers, securities, etc.

TruFin is interested to capture log analytics to analyze different dimensions like customer navigational behaviour, website usage, click stream, security threats in real time to fulfil different business requirements.

Different applications use different log file formats (Apache Common Log, Apache Combined Log, Apache Error Log, and RFC3164 Syslog) stored in different directory structures to host their logs. TruFin is looking at monitoring and ingestion of the above files, standardize into a common format before feeding into different kinesis streams. Select 1 Option.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answer : F.

Option A is incorrect - KPL library cannot be used to capture files.

Besides Aggregation helps to improve the per shard throughput.

This is also optimizes the overall TCO of the stream.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.

Aggregation refers to the storage of multiple records in a Kinesis Data Streams record.

Aggregation allows customers to increase the number of records sent per API call, which effectively increases producer throughput.

Kinesis Data Streams shards support up to 1,000 Kinesis Data Streams records per second, or 1 MB throughput.

The Kinesis Data Streams records per second limit binds customers with records smaller than 1 KB.

Record aggregation allows customers to combine multiple records into a single Kinesis Data Streams record.

This allows customers to improve their per shard throughput.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option B is incorrect - Streams API cannot be used to monitor, capture and standardize file changes.

Streams API, using PutRecords operation sends multiple records to Kinesis Data Streams in a single request.

By using PutRecords, producers can achieve higher throughput when sending data to their Kinesis data stream.

Each PutRecords request can support up to 500 records.

Each record in the request can be as large as 1 MB, up to a limit of 5 MB for the entire request, including partition keys.

Also the platform programmatically supports changing between submissions of single records versus multiple records in a single HTTP request.

https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-sdk.html

Option C is incorrect - KPL library cannot be used to monitor, capture and standardize file changes.

Besides Collection reduces the overhead of making many separate HTTP requests for a multi-shard stream.

Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.

Collection refers to batching multiple Kinesis Data Streams records and sending them in a single HTTP request with a call to the API operation PutRecords, instead of sending each Kinesis Data Streams record in its own HTTP request.

This increases throughput compared to using no collection because it reduces the overhead of making many separate HTTP requests.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

Option D is incorrect - Kinesis Firehose cannot be used to monitor, capture and standardize file changes.

Besides Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.

Kinesis Data Firehose is part of the Kinesis streaming data platform, along with Kinesis Data Streams, Kinesis Video Streams, and Amazon Kinesis Data Analytics.

With Kinesis Data Firehose, you don't need to write applications or manage resources.

You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified.

You can also configure Kinesis Data Firehose to transform your data before delivering it.

Kinesis Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations.

You can enable Kinesis Data Firehose data transformation when you create your delivery stream.

Kinesis Data Firehose provides the following Lambda blueprints that you can use to create a Lambda function for data transformation.

General Firehose Processing - Contains the data transformation and status model described in the previous section.

Use this blueprint for any custom transformation logic.

Apache Log to JSON - Parses and converts Apache log lines to JSON objects, using predefined JSON field names.

Apache Log to CSV - Parses and converts Apache log lines to CSV format.

Syslog to JSON - Parses and converts Syslog lines to JSON objects, using predefined JSON field names.

Syslog to CSV - Parses and converts Syslog lines to CSV format.

Kinesis Data Firehose Process Record Streams as source - Accesses the Kinesis Data Streams records in the input and returns them with a processing status.

Kinesis Data Firehose CloudWatch Logs Processor - Parses and extracts individual log events from records sent by CloudWatch Logs subscription filters.

https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html

Option E is incorrect - Kinesis Firehose cannot be used to monitor, capture and standardize file changes.

Besides Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.

Kinesis Data Firehose is part of the Kinesis streaming data platform, along with Kinesis Data Streams, Kinesis Video Streams, and Amazon Kinesis Data Analytics.

With Kinesis Data Firehose, you don't need to write applications or manage resources.

You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified.

You can also configure Kinesis Data Firehose to transform your data before delivering it.

Kinesis Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations.

You can enable Kinesis Data Firehose data transformation when you create your delivery stream.

Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3

Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON.

https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

Option F is correct - Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Data Streams.

The agent continuously monitors a set of files and sends new data to your stream.

The agent handles file rotation, checkpointing, and retry upon failures.

It delivers all of your data in a reliable, timely, and simple manner.

It also emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process.

Configure the agent to monitor multiple file directories and send data to multiple streams.

The agent can pre-process the records parsed from monitored files before sending them to your stream.

https://docs.aws.amazon.com/streams/latest/dev/writing-with-agents.html#sim-writes

TruFin is running multiple web applications that produce log files in different formats and stored in different directories. The company is interested in analyzing these logs to gain insights into customer navigational behavior, website usage, clickstream, and security threats in real-time.

To achieve this, TruFin needs to ingest log data from multiple sources, standardize it into a common format, and feed it into different Kinesis streams. Kinesis is a fully managed service provided by AWS for real-time data processing.

There are several options available for ingesting and processing data in Kinesis, but the most suitable option for TruFin would be to use Kinesis Firehose, with Lambda blueprints to handle data transformation, or Kinesis Agent, with pre-processing of data.

Kinesis Firehose is a fully managed service that can capture and load streaming data into AWS services such as Amazon S3, Amazon Redshift, and Amazon Elasticsearch. It can also transform data before loading it into these services using AWS Lambda.

In this case, TruFin can use Kinesis Firehose to handle record format conversion and to standardize log data into a common format. The company can use AWS Lambda blueprints to transform the data before loading it into the Kinesis stream.

Alternatively, TruFin can use Kinesis Agent to pre-process the data before sending it to Kinesis. Kinesis Agent is a lightweight agent that can monitor log files and send the data to Kinesis streams in real-time. It can also pre-process the data by parsing and transforming it before sending it to Kinesis.

However, Kinesis Agent is limited to certain log file formats, so TruFin would need to ensure that its log files are compatible with Kinesis Agent before using this approach.

In conclusion, the most suitable options for TruFin to capture and standardize log data into a common format before feeding it into different Kinesis streams are Kinesis Firehose with Lambda blueprints or Kinesis Agent with pre-processing of data.