Optimizing BigQuery Storage for Application Logs | Professional Cloud Architect Exam

Optimizing BigQuery Storage for Application Logs

Question

Your applications will be writing their logs to BigQuery for analysis.

Each application should have its own table.

Any logs older than 45 days should be removed.

You want to optimize storage and follow Google-recommended practices.

What should you do?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

B.

Option B is the correct answer: Make the tables time-partitioned, and configure the partition expiration at 45 days.

BigQuery is a fully-managed, cloud-native data warehouse that can store and query large amounts of data. It is a serverless platform, which means that you don't have to worry about managing any infrastructure or scaling resources up and down. It is highly scalable and can handle petabyte-scale data sets with ease.

When it comes to logging, BigQuery is a popular destination for storing logs, as it allows for easy querying and analysis. In this scenario, we want to optimize storage and follow Google-recommended practices while ensuring that logs older than 45 days are removed. Let's explore the options provided:

Option A: Configure the expiration time for your tables at 45 days This option is not optimal because it would delete the entire table after 45 days, including logs that are less than 45 days old. This means that if you wanted to retain logs for 45 days, you would need to create a new table for each day, which would not be a practical solution.

Option B: Make the tables time-partitioned, and configure the partition expiration at 45 days This option is the correct one. Time-partitioning means that data is divided into smaller, more manageable parts based on a time column. In this case, we can partition the tables based on the date and time of the log entry. This allows for efficient querying and analysis of specific time periods. By setting the partition expiration at 45 days, the data older than 45 days will automatically be deleted, while the more recent data will be retained.

Option C: Rely on BigQuery's default behavior to prune application logs older than 45 days This option is not recommended, as BigQuery's default behavior is to keep all data indefinitely. You could use the Data Lifecycle Manager to set up retention policies, but time-partitioning provides a more efficient and cost-effective solution.

Option D: Create a script that uses the BigQuery command line tool (bq) to remove records older than 45 days. This option is not recommended, as it requires manual intervention and maintenance. Time-partitioning provides an automated solution, which reduces the risk of errors and ensures that data is consistently managed over time.

In conclusion, time-partitioning tables and configuring the partition expiration at 45 days is the recommended approach for storing application logs in BigQuery. This ensures that logs older than 45 days are automatically removed, while retaining the more recent data for analysis.