You have an application that uses Cloud Spanner as a database backend to keep current state information about users.
Cloud Bigtable logs all events triggered by users.
You export Cloud Spanner data to Cloud Storage during daily backups.
One of your analysts asks you to join data from Cloud Spanner and Cloud Bigtable for specific users.
You want to complete this ad hoc request as efficiently as possible.
What should you do?
Click on the arrows to vote for the correct answer
A. B. C. D.B.
The best option for joining data from Cloud Spanner and Cloud Bigtable for specific users in an ad hoc manner would be to create two separate BigQuery external tables on Cloud Storage and Cloud Bigtable, and then use the BigQuery console to join these tables through user fields and apply appropriate filters.
Here's a more detailed explanation of why this option is the best:
Option A (Create a dataflow job that copies data from Cloud Bigtable and Cloud Storage for specific users) is not the best choice because it involves copying a large amount of data into Dataflow, which may be inefficient and time-consuming. Additionally, since Dataflow is a batch-processing system, it may not be ideal for an ad hoc request that needs to be completed as quickly as possible.
Option B (Create a dataflow job that copies data from Cloud Bigtable and Cloud Spanner for specific users) is not the best choice because Cloud Spanner and Cloud Bigtable have different data models, and it may be difficult to perform an efficient join between them using Dataflow.
Option C (Create a Cloud Dataproc cluster that runs a Spark job to extract data from Cloud Bigtable and Cloud Storage for specific users) may be a viable option, but it requires more setup time and resources than the other options. Additionally, since Spark is also a batch-processing system, it may not be ideal for an ad hoc request that needs to be completed as quickly as possible.
Option D (Create two separate BigQuery external tables on Cloud Storage and Cloud Bigtable. Use the BigQuery console to join these tables through user fields, and apply appropriate filters) is the best option because it allows you to join the data in a way that is optimized for ad hoc queries, without having to copy a large amount of data into a batch-processing system. Additionally, BigQuery is designed to handle large amounts of data quickly, making it well-suited for this type of query.