RealEst has been a trusted resource for home buyers, sellers, and dreamers in US, offering a comprehensive database of for-sale properties as well as the information, tools, and professional expertise to help people move confidently through every step of their home journey.
RealEst is now expanding in other regions like Asia, Europe, and Middle East.
Based on the continuous demand and additional business query needs, RealEst understands that their AD campaigns need to be based on real-time feeds to provide recommendations to customers in minutes rather than days and weeks. RealEst is looking to track AD impressions and use it for analysis and need to be converted into binary formats like parquet, compressed and stored in S3 Data Lake built on EMR.
The AD impressions need to be buffered, aggregated, and loaded into RedShift for analysis.
The IT team need to track failure records and provide route cause analysis and fix. Please suggest the recommended streaming platform and capabilities along with ease of management and integration with new producers and consumers.
Select 3 options.
Click on the arrows to vote for the correct answer
A. B. C. D. E. F.Answer : B, C and E.
Option A is incorrect - Kinesis Data Streams is not the right platform to fulfil the requirements since Kinesis Data Streams does not provide data transformation and record format conversion.
Besides the customer is looking for near real time response in minutes.
Besides Kinesis Data Streams to collect and process large streams of data records in real time.
You can create data-processing applications, known as Kinesis Data Streams applications.
A typical Kinesis Data Streams application reads data from a data stream as data records.
These applications can use the Kinesis Client Library, and they can run on Amazon EC2 instances.
se Kinesis Data Streams for rapid and continuous data intake and aggregation.
The type of data used can include IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data.
Because the response time for the data intake and processing is in real time, the processing is typically lightweight.
The following are typical scenarios for using Kinesis Data Streams:
Accelerated log and data feed intake and processing.
Real-time metrics and reporting.
Real-time data analytics.
Complex stream processing.
https://docs.aws.amazon.com/streams/latest/dev/introduction.htmlOption B is correct - Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3
Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON.
If you want to convert an input format other than JSON, such as comma-separated values (CSV) or structured text, you can use AWS Lambda to transform it to JSON first.
https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.htmlOption C is correct - Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.
Kinesis Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations.
You can enable Kinesis Data Firehose data transformation when you create your delivery stream.
Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3
Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON.
If you want to convert an inputformat other than JSON, such as comma-separated values (CSV) or structured text, you can use AWS Lambda to transform it to JSON first.
https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.htmlOption D is incorrect - With Amazon Kinesis Data Analytics for SQL Applications, you can process and analyze streaming data using standard SQL.
The service enables you to quickly author and run powerful SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics.
Kinesis Analytics a streaming source which can be Kinesis Data Streams or Kinesis Data Firehose.
https://docs.aws.amazon.com/kinesisanalytics/latest/dev/what-is.htmlOption E is correct - Kinesis Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations.
You can enable Kinesis Data Firehose data transformation when you create your delivery stream.
https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.htmlOption F is incorrect - Batching of records will improve the throughput but this is not the point of concern.
The problem is not about data ingestion.
Besides Aggregation helps to improve the per shard throughput.
This is also optimizes the overall TCO of the stream.
Batching refers to performing a single action on multiple items instead of repeatedly performing the action on each individual item.
Aggregation refers to the storage of multiple records in a Kinesis Data Streams record.
Aggregation allows customers to increase the number of records sent per API call, which effectively increases producer throughput.
Kinesis Data Streams shards support up to 1,000 Kinesis Data Streams records per second, or 1 MB throughput.
The Kinesis Data Streams records per second limit binds customers with records smaller than 1 KB.
Record aggregation allows customers to combine multiple records into a single Kinesis Data Streams record.
This allows customers to improve their per shard throughput.
https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.htmlRealEst requires a real-time streaming platform to track AD impressions and provide recommendations to customers in minutes. The AD impressions need to be buffered, aggregated, and loaded into RedShift for analysis. The IT team needs to track failure records and provide route cause analysis and fix. Based on these requirements, the recommended streaming platform and capabilities along with ease of management and integration with new producers and consumers are:
Kinesis Data Streams: Kinesis Data Streams is a scalable and durable real-time streaming service that allows you to build custom applications to process and analyze streaming data in real-time. RealEst can use Kinesis Data Streams to capture and process AD impressions in real-time. Kinesis Data Streams provides a reliable and scalable platform for buffering and aggregating AD impressions before storing them in S3 Data Lake built on EMR. Kinesis Data Streams can be easily integrated with other AWS services like Kinesis Firehose, Kinesis Analytics, Lambda, and RedShift for further processing and analysis.
Kinesis Data Firehose: Kinesis Data Firehose is a fully managed service that makes it easy to load streaming data into data stores and analytics tools. RealEst can use Kinesis Data Firehose to transform and compress AD impressions into Parquet format before loading them into S3 Data Lake. Kinesis Data Firehose can also be configured to directly load data into RedShift for analysis. Kinesis Data Firehose automatically scales to match the volume of incoming data, and it is easy to manage and integrate with other AWS services.
Kinesis Analytics for SQL Applications: Kinesis Analytics is a fully managed service that enables real-time processing of streaming data using SQL queries. RealEst can use Kinesis Analytics to perform real-time analytics on the AD impressions data streams. Kinesis Analytics provides a powerful and easy-to-use SQL-based interface for querying and analyzing streaming data. Kinesis Analytics can be integrated with Kinesis Data Streams and Kinesis Data Firehose for data ingestion and output.
In conclusion, the recommended streaming platform and capabilities for RealEst's use case are Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Analytics for SQL Applications. These services offer easy management and integration with new producers and consumers, and they provide reliable and scalable solutions for real-time data processing, transformation, and analysis.