You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign.
You have streamed 500 MB of campaign data into BigQuery.
You want to query the table, and then manipulate the results of that query with a pandas dataframe in an AI Platform notebook.
What should you do?
Click on the arrows to vote for the correct answer
A. B. C. D.C.
https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandasThe correct answer for this question is A. Use AI Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe.
Here is a detailed explanation:
BigQuery is a serverless data warehouse that allows users to query and analyze large datasets using SQL-like syntax. It is a fully-managed, cloud-native platform that can handle massive amounts of data, with scalability and performance.
AI Platform Notebooks is a fully-managed, cloud-based JupyterLab environment that provides users with access to a virtual machine instance with pre-installed machine learning frameworks and libraries, including pandas, TensorFlow, and scikit-learn. It also provides integration with BigQuery, allowing users to easily query and manipulate large datasets.
To query the table in BigQuery and manipulate the results with a pandas dataframe in an AI Platform notebook, we can use the BigQuery cell magic in Jupyter notebooks. BigQuery cell magic is a built-in Jupyter magic command that allows users to run SQL queries on BigQuery tables directly from a Jupyter notebook cell.
Here are the steps to follow:
Create a new AI Platform notebook instance in your Google Cloud project.
In your notebook, import the BigQuery cell magic by adding the following code to a cell:
perl%load_ext google.cloud.bigquery
pythonfrom google.colab import auth auth.authenticate_user()
sql%%bigquery --project your-project-id df SELECT * FROM 'your-project-id.advertising.campaign_data' LIMIT 100
This will create a pandas dataframe named "df" that contains the results of your query.
Note that you will need to replace "your-project-id" with your actual Google Cloud project ID, and update the table and dataset names accordingly.
Option B is not the best choice as exporting the table as a CSV file from BigQuery to Google Drive, and then ingesting the file into the notebook instance through the Google Drive API can be an unnecessary step, and can also increase the risk of data leakage.
Option C is also not the best choice as downloading the table from BigQuery as a local CSV file and uploading it to the notebook instance can be time-consuming and inefficient, especially for large datasets.
Option D involves exporting the table as a CSV file to Cloud Storage and then copying the data into the notebook using gsutil. While this option may work, it is not as efficient as using the BigQuery cell magic, and it requires additional steps to copy the data into the notebook instance.