Crawling CloudTrail Log Files | Implementation Plan

Crawling CloudTrail Log Files

Question

Your company has a requirement to crawl all of the log files generated via Cloudtrail for better analysis of the log files.

Which of the following would be part of the Implementation plan for this requirement? Choose 2 answers from the options given below.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A and B.

This use case is provided in the AWS Documentation.

CloudTrail delivers log files in an Amazon S3 bucket folder.

To correctly crawl these logs, you modify the file contents and folder structure using an Amazon S3-triggered Lambda function that stores the transformed files in an S3 bucket single folder.

When the files are in a single folder, AWS Glue scans the data, converts it into Apache Parquet format, and catalogs it to allow for querying and visualization using Amazon Athena and Amazon QuickSight.

Options C and D are incorrect since the ideal place to store the logs would be S3.

For more information on Cloud Trail Visualization, please refer to the below URL.

https://aws.amazon.com/blogs/big-data/streamline-aws-cloudtrail-log-visualization-using-aws-glue-and-amazon-quicksight/

To implement the requirement of crawling all the log files generated via Cloudtrail, we need to consider the following:

  1. Transformation of Log Files: The log files generated by Cloudtrail may not be in a format suitable for analysis. Therefore, we need to transform these log files into a format that can be queried easily. Lambda functions can be used to transform the log files into a format suitable for further processing and analysis. Hence, option A is a valid choice.

  2. Cataloguing the Information: After the transformation of the log files, we need to catalog the information in a centralized repository to enable querying and analysis. AWS Glue can be used for this purpose. AWS Glue is a fully managed ETL (Extract, Transform, Load) service that can crawl data sources, infer schemas, and generate ETL scripts to transform data. Hence, option B is also a valid choice.

  3. Storing the Log Files: Once the log files are transformed and cataloged, we need to store them in a scalable and cost-effective manner for further analysis. AWS Redshift is a data warehousing service that can store and analyze large amounts of data. It can be used to store the transformed log files in a structured and optimized format for querying and analysis. Hence, option C can be considered.

  4. Alternative Storage: Another alternative for storing the transformed log files is AWS DynamoDB. DynamoDB is a NoSQL database service that can store and retrieve any amount of data. It can be used to store the transformed log files in a flexible and scalable manner. However, since the log files generated by Cloudtrail may not be in a format suitable for NoSQL databases, DynamoDB may not be the best choice for this scenario. Hence, option D is not a valid choice.

In summary, the implementation plan for the requirement to crawl all the log files generated via Cloudtrail would include using Lambda functions to transform the log files and AWS Glue for cataloguing the information. The transformed log files can be stored in either AWS Redshift or DynamoDB, depending on the requirements and limitations of the data format.