You have a large 5-TB AVRO file stored in a Cloud Storage bucket.
Your analysts are proficient only in SQL and need access to the data stored in this file.
You want to find a cost-effective way to complete their request as soon as possible.
What should you do?
Click on the arrows to vote for the correct answer
A. B. C. D.C.
The most cost-effective and efficient solution to provide SQL access to a large AVRO file stored in Cloud Storage is to create external tables in BigQuery that point to Cloud Storage buckets and run SQL queries on these external tables.
Option A: Loading data in Cloud Datastore and running a SQL query against it is not a feasible option in this scenario. Cloud Datastore is a NoSQL document database, and it does not support the structured query language (SQL).
Option B: Creating a BigQuery table and loading data in BigQuery is an option, but it is not the most cost-effective solution because BigQuery charges for storage and processing. Dropping the table after the request completes can reduce costs, but it can also create a delay if the data needs to be reloaded for subsequent requests.
Option C: Creating external tables in BigQuery that point to Cloud Storage buckets is the recommended solution. BigQuery's external tables allow SQL queries to be run on data stored in Cloud Storage without loading the data into BigQuery. This option provides the benefits of BigQuery's scalability, performance, and SQL capabilities, while minimizing costs by avoiding data duplication.
Option D: Creating a Hadoop cluster and copying the AVRO file to NDFS by compressing it is not the most efficient option in this scenario. It involves additional infrastructure and maintenance costs and is not as cost-effective as using BigQuery's external tables.
In summary, the best option is to create external tables in BigQuery that point to Cloud Storage buckets and run SQL queries on these external tables. This approach provides efficient access to the data while minimizing costs.