You are working at a sports association whose members range in age from 8 to 30
The association collects a large amount of health data, such as sustained injuries.
You are storing this data in BigQuery.
Current legislation requires you to delete such information upon request of the subject.
You want to design a solution that can accommodate such a request.
What should you do?
Click on the arrows to vote for the correct answer
A. B. C. D.B.
Sure, I'd be happy to provide a detailed explanation of each answer option for you.
A. Use a unique identifier for each individual. Upon a deletion request, delete all rows from BigQuery with this identifier. This option suggests using a unique identifier for each individual and deleting all rows associated with that identifier upon a deletion request. This is a straightforward approach that would ensure that all of an individual's data is deleted when requested. However, it may be overly broad, as the request may not necessarily require all data associated with the individual to be deleted. Additionally, deleting all of an individual's data may result in the loss of valuable insights that could be gained from analyzing the data.
B. When ingesting new data in BigQuery, run the data through the Data Loss Prevention (DLP) API to identify any personal information. As part of the DLP scan, save the result to Data Catalog. Upon a deletion request, query Data Catalog to find the column with personal information. This option suggests using the DLP API to identify personal information in the data and storing the results in Data Catalog. This approach would allow for easy identification of the personal information in the data upon a deletion request, which would make it easier to comply with the legal requirement to delete the information. However, it may require additional effort and resources to set up and maintain the DLP scan and Data Catalog.
C. Create a BigQuery view over the table that contains all data. Upon a deletion request, exclude the rows that affect the subject's data from this view. Use this view instead of the source table for all analysis tasks. This option suggests creating a view over the table containing all data and excluding the rows that affect the subject's data from the view upon a deletion request. This approach would allow for continued analysis of the remaining data while complying with the legal requirement to delete the subject's data. However, it may require additional effort to set up and maintain the view, and it may not be practical if the subject's data is spread across multiple rows or columns.
D. Use a unique identifier for each individual. Upon a deletion request, overwrite the column with the unique identifier with a salted SHA256 of its value. This option suggests using a unique identifier for each individual and overwriting the identifier column with a salted SHA256 of its value upon a deletion request. This approach would ensure that the data is no longer identifiable to the individual while preserving the rest of the data for analysis. However, it may not comply with the legal requirement to delete the data, as the data would still exist in the database, albeit in an anonymized form.
In summary, all of the options have their pros and cons, and the best approach would depend on the specific requirements and constraints of the situation. Option B may be the most comprehensive and flexible approach, as it allows for easy identification and deletion of personal information while still preserving the rest of the data for analysis.