Filtering and Processing Logs in AWS Auto Scaling Group | Unique Visitors Data Collection

Filtering and Processing Logs in AWS Auto Scaling Group

Prev Question Next Question

Question

You have a large number of web servers in an Auto Scaling group behind a load balancer.

On an hourly basis, you want to filter and process the logs to collect data on unique visitors, and then put that data in a durable data store to run reports.

Web servers in the Auto Scaling group are constantly launching and terminating based on your scaling policies.

But you do not want to lose any of the log data from these servers during a stop/termination initiated by a user or by Auto Scaling.

What two approaches will meet these requirements? Choose two answers from the options given below.

Answers

A. Install an Amazon Cloudwatch Logs Agent on every web server during the bootstrap process. Create a CloudWatch log group and define Metric Filters to create custom metrics that track unique visitors from the streaming web server logs. Create a scheduled task on an Amazon EC2 instance that runs every hour to generate a new report based on the Cloudwatch custom metrics.

B. On the web servers, create a scheduled task that executes a script to rotate and transmit the logs to Amazon Glacier. Ensure that the operating system shutdown procedure triggers a logs transmission when the Amazon EC2 instance is stopped/terminated.Use Amazon Data Pipeline to process the data in Amazon Glacier and run reports every hour.

C. On the web servers, create a scheduled task that executes a script to rotate and transmit the logs to an Amazon S3 bucket. Ensure that the operating system shutdown procedure triggers a logs transmission when the Amazon EC2 instance is stopped/terminated. Use AWS Data Pipeline to move log data from the Amazon S3 bucket to Amazon Redshift to process and run reports every hour.

D. Install an AWS Data Pipeline Logs Agent on every web server during the bootstrap process. Create a log group object in AWS Data Pipeline, and define Metric Filters to move processed log data directly from the web servers to Amazon Redshift and run reports every hour.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A and C.

You can use the CloudWatch Logs agent installer on an existing EC2 instance to install and configure the CloudWatch Logs agent.

For more information, please visit the below link:

http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/QuickStartEC2Instance.html

You can publish your own metrics to CloudWatch using the AWS CLI or an API.

For more information, please visit the below link:

http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html

Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools.

It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution.

Most results come back in seconds.

For more information on copying data from S3 to redshift, please refer to the below link:

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-redshift.html

The two approaches that meet the requirements are A and C. Let's go through each option to understand why:

Option A: Install an Amazon Cloudwatch Logs Agent on every web server during the bootstrap process. Create a CloudWatch log group and define Metric Filters to create custom metrics that track unique visitors from the streaming web server logs. Create a scheduled task on an Amazon EC2 instance that runs every hour to generate a new report based on the Cloudwatch custom metrics.

Explanation: With this approach, the Amazon CloudWatch Logs Agent is installed on every web server during the bootstrap process. The logs from these servers are then streamed to a CloudWatch log group where metric filters are defined to track unique visitors from the logs. A scheduled task runs on an Amazon EC2 instance every hour to generate a report based on the CloudWatch custom metrics.

This approach ensures that log data is not lost during a stop/termination initiated by a user or by Auto Scaling, as the logs are continuously streamed to the CloudWatch log group. Additionally, since the data is stored in CloudWatch, it is durable and can be retrieved at any time.

Option C: On the web servers, create a scheduled task that executes a script to rotate and transmit the logs to an Amazon S3 bucket. Ensure that the operating system shutdown procedure triggers a logs transmission when the Amazon EC2 instance is stopped/terminated. Use AWS Data Pipeline to move log data from the Amazon S3 bucket to Amazon Redshift to process and run reports every hour.

Explanation: With this approach, a scheduled task is created on the web servers to rotate and transmit the logs to an Amazon S3 bucket. The logs are transmitted when the Amazon EC2 instance is stopped/terminated, ensuring that log data is not lost. AWS Data Pipeline is used to move log data from the Amazon S3 bucket to Amazon Redshift to process and run reports every hour.

This approach also ensures that log data is not lost during a stop/termination initiated by a user or by Auto Scaling. The logs are transmitted to an Amazon S3 bucket, which provides durable storage, and can be retrieved at any time. The use of AWS Data Pipeline to move log data from the Amazon S3 bucket to Amazon Redshift allows for the data to be processed and reports to be generated every hour.

Option B and D are incorrect because:

Option B: On the web servers, create a scheduled task that executes a script to rotate and transmit the logs to Amazon Glacier. Ensure that the operating system shutdown procedure triggers a logs transmission when the Amazon EC2 instance is stopped/terminated. Use Amazon Data Pipeline to process the data in Amazon Glacier and run reports every hour.

Explanation: With this approach, a scheduled task is created on the web servers to rotate and transmit the logs to Amazon Glacier. The logs are transmitted when the Amazon EC2 instance is stopped/terminated, ensuring that log data is not lost. Amazon Data Pipeline is used to process the data in Amazon Glacier and run reports every hour.

However, Amazon Glacier is an archival storage service designed for long-term storage of data, and not ideal for storing and retrieving log data on a frequent basis. Additionally, the use of Amazon Data Pipeline to process the data in Amazon Glacier may not be suitable for generating reports every hour, as the data retrieval time may be too long.

Option D: Install an AWS Data Pipeline Logs Agent on every web server during the bootstrap process. Create a log group object in AWS Data Pipeline, and define Metric Filters to move processed log data directly from the web servers to Amazon Redshift and run reports every hour.

Explanation: With this approach, an AWS Data Pipeline Logs Agent is installed on every web server during the bootstrap process. A log group object is created in AWS Data Pipeline, and

Prev Question Next Question