AWS Certified Machine Learning - Specialty Exam: Troubleshooting Kinesis Data Firehose Data Transformation Failure

Troubleshooting Kinesis Data Firehose Data Transformation Failure

Question

You work for a car rental firm in their car tracking department.

Your team is responsible for building machine learning solutions to track the company's fleet of cars.

Each car is equipped with a GPS vehicle tracking device that emits IoT data.

You are building a data transformation solution to take the GPS IoT data and transform it before storing it in S3 for use in your machine learning models. You have decided to use Kinesis Data Firehose data transformation to pre-process your IoT data before storing it in S3

You have written your lambda function that pre-processes the data, and you are now testing your data transformation process flow.

When running your tests, you see that Kinesis Data Firehose rejects every record as a data transformation failure.

What could be the reason for the failure?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect.

The status of your transformed record produced by your lambda function can be Ok (the record was transformed successfully), Dropped (the record was dropped intentionally by your transformation logic), or ProcessingFailed (the record could not be transformed)

A status of Ok or Dropped indicates to Kinesis Data Firehose that the record was successfully processed.

A status of ProcessingFailed indicates a failed transformation.

Your lambda function has set each record's status to either Ok or Dropped, so this option is incorrect.

Option B is correct.

Transformed records received by Kinesis Data Firehose from lambda must contain the recordId, result, and data parameters.

Your transformed records only contain the recordId and result parameters.

Option C is incorrect.

You can use lambda blueprints from either the AWS Serverless Application Repository or the AWS Lambda console to create your transformation lambda function.

Option D is incorrect.

You can use lambda blueprints from either the AWS Serverless Application Repository or the AWS Lambda console to create your transformation lambda function.

Reference:

Please see the Amazon Kinesis Data Firehose developer guide titled Amazon Kinesis Data Firehose Data Transformation.

The reason for the data transformation failure in this scenario could be option A, where the Lambda function has set the result to OK or Dropped for each record processed.

Kinesis Data Firehose is a fully managed service that enables real-time delivery of streaming data to destinations such as S3, Redshift, Elasticsearch, and Splunk. It can also be used to transform data before delivering it to the destination. To perform the transformation, you can use a Lambda function that processes the incoming data and returns the transformed data to Kinesis Data Firehose.

When using a Lambda function with Kinesis Data Firehose for data transformation, the Lambda function should return a transformed record that includes the transformed data and metadata such as the record ID and the result of the transformation. The metadata should be returned in a specific format, which includes the record ID and the result set to "Ok" or "Dropped."

If the Lambda function returns a transformed record without the required metadata or with the result set to a value other than "Ok" or "Dropped," Kinesis Data Firehose will reject the record as a data transformation failure. This means that the data will not be delivered to the destination and will not be available for use in machine learning models.

Option B suggests that the transformed records from the Lambda function include the required metadata, which should prevent the data transformation failure. However, option C and D are related to the source of the Lambda function blueprint and are unlikely to cause the data transformation failure in this scenario.

Therefore, the most likely reason for the data transformation failure is option A, where the Lambda function has set the result to a value other than "Ok" or "Dropped" for each record processed.