Clickstream Data Pre-processing with Lambda Functions | AWS Certified Big Data - Specialty Exam

Supported Features in Lambda Functions for Data Pre-processing

Question

Clickstream data is captured through Streams API, uses Kinesis Data Streams for streaming, and loads the data into Kinesis Analytics for further processing before being loaded into Redshift.

The team want to perform some pre-processing of data using lambda functions before Kinesis Analytics application SQL code executes. Which of the features mentioned below are supported in lambda functions for pre-processing of data? Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer : A,B and C.

Option A is correct - The data in your stream needs format conversion, transformation, enrichment, or filtering, you can preprocess the data using an AWS Lambda function.

You can do this before your application SQL code executes or before your application creates a schema from your data stream.

Using a Lambda function for pre-processing records is useful in the following scenarios:

Transforming records from other formats (such as KPL or GZIP) into formats that Kinesis Data Analytics can analyze.

Kinesis Data Analytics currently supports JSON or CSV data formats.

Expanding data into a format that is more accessible for operations such as aggregation or anomaly detection.

For instance, if several data values are stored together in a string, you can expand the data into separate columns.

Data enrichment with other AWS services, such as extrapolation or error correction.

Applying complex string transformation to record fields.

Data filtering for cleaning up the data.

Option E mentions about parquet and avro format conversion, which is not supported as part of preprocessing.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/lambda-preprocessing.html

Lambda functions can be used to perform pre-processing of data before it is processed by Kinesis Analytics application SQL code. Below are the supported features in Lambda functions for pre-processing of data:

A. Data Enrichment: Lambda functions can be used to add more data or metadata to incoming records. This process is called data enrichment. Data enrichment can be used to augment the incoming data with additional information or to add context to the data. This feature is useful for building more robust data pipelines and improving data quality.

B. Complex string transformation: Lambda functions can be used to perform complex string transformations on incoming records. For example, you can use a Lambda function to split a string into multiple fields or to extract specific parts of a string.

C. Data Filtering: Lambda functions can be used to filter incoming records based on specific criteria. This feature is useful for removing unwanted records from the data stream or for routing records to different destinations based on specific conditions.

D. Data Transformation into Avro and Parquet formats: Lambda functions can be used to transform incoming records into Avro or Parquet formats. This feature is useful for optimizing data storage and improving query performance.

E. Record Format Conversion: Lambda functions can be used to convert incoming records from one format to another. For example, you can use a Lambda function to convert records from JSON format to CSV format.

Out of the given options, A, B, and C are supported in Lambda functions for pre-processing of data. Data transformation into Avro and Parquet formats and record format conversion are not supported in Lambda functions.