A financial company has used Kinesis Stream to store system logs in real time from a busy application.
Then the data in Kinesis Stream is sent to a Kinesis Firehose delivery stream which delivers data to the final S3 bucket destination.
The input data format is RFC3163 Syslog however it is required to convert the format to JSON in Kinesis Firehose before the data is delivered.
How should you implement this?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer - A.
Refer to.
https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.htmlfor how to transform data format in Kinesis Data Firehose.
Option A is CORRECT: Kinesis Data Firehose provides the following Lambda blueprint to convert Syslog to JSON:
Option B is incorrect: AWS Glue can create a schema in AWS Glue Data Catalog however it cannot convert the data format in Kinesis Firehose.
Option C is incorrect: Apache Hive JSON SerDe can be used to serialize/deserialize JSON data.
It is not used to convert Syslog to JSON.
Option D is incorrect: Because Kinesis Stream should not be used to convert data format.
This can be done in Kinesis Firehose.
Data transformation is also very common for Kinesis Firehose:
The correct answer is A. Create a Lambda function for data transformation using a blueprint. Kinesis Data Firehose can invoke the Lambda function to transform incoming source data.
Explanation: The scenario described in the question involves collecting system logs in real-time using Kinesis Stream and then delivering the data to an S3 bucket using Kinesis Firehose. The input data format is in RFC3163 Syslog format but it needs to be converted to JSON before being delivered to the S3 bucket.
Kinesis Data Firehose is a fully managed service that can capture, transform, and load streaming data into data stores and analytics tools. It allows you to transform data before it's stored in destinations such as S3, Redshift, and Elasticsearch.
There are several ways to transform data using Kinesis Data Firehose, but the most appropriate way in this scenario is to use a Lambda function for data transformation. This is because Lambda is a serverless compute service that allows you to run code in response to events and it can be easily integrated with Kinesis Data Firehose to transform incoming source data.
To implement this solution, you would create a Lambda function that transforms the incoming Syslog data into JSON format. You can use one of the available Lambda blueprints to quickly get started with a function that transforms data. You can also write your own custom function if necessary.
Once the Lambda function is created, you can configure Kinesis Data Firehose to use the function as a data transformation option. Kinesis Data Firehose can then invoke the Lambda function to transform the incoming Syslog data into JSON format before delivering it to the S3 bucket.
Option B is incorrect because AWS Glue is not designed for data format transformation. Glue is primarily used for ETL (Extract, Transform, Load) operations, where data is extracted from various sources, transformed into a suitable format, and then loaded into a target data store.
Option C is incorrect because although Apache Hive JSON SerDe can be used to convert data to JSON format, it is not a fully managed service like Kinesis Data Firehose. It would require manual configuration and maintenance, which could be time-consuming and error-prone.
Option D is incorrect because Kinesis Firehose does not support data format transformation on its own. Therefore, data format transformation needs to be done either in Kinesis Stream or by using a data transformation service like Lambda.