You work as a machine learning specialist for a banking firm working in the credit card processing division.
Your team builds a credit limit authorization model that needs to use a dataset containing personally identifiable information (PII), such as customer credit card information.
How will your team ensure the PII data remains encrypted and the credit card information is not compromised?
Click on the arrows to vote for the correct answer
A. B. C. D. E. F.Correct Answer: A.
Option A is correct.
Using KMS to encrypt the data is the best choice of the options given.
KMS is a managed service and encrypts your data using the same key for S3 and SageMaker.
Also, using a Glue ETL job, you can remove the credit card information from the dataset by dropping that column from the dataset in your ETL job.
Option B is incorrect.
A lifecycle configuration is a script that runs when your notebook instance is created.
Trying to use a lifecycle configuration to encrypt data is not an option that would work.
Also, the Principal Component Analysis algorithm reduces the dimensionality of a dataset.
One would not use PCA to obfuscate PII data.
Option C is incorrect.
You can't encrypt data with an IAM policy alone.
You need to combine your IAM policies with a service like KMS.
Option D is incorrect.
This option is incorrect because writing your own encryption algorithm is counterproductive when you have a managed service in AWS, KMS, that will encrypt your data to the highest industry standards.
Also, using a DeepAR algorithm to randomize the PII data further complicates this option.
References:
Please see the AWS Glue developer guide titled Built-In Transforms (https://docs.aws.amazon.com/glue/latest/dg/built-in-transforms.html),
The Amazon SageMaker developer guide titled Principal Component Analysis (PCA) Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/pca.html),
The Amazon SageMaker developer guide titled Customize a Notebook Instance Using a Lifecycle Configuration Script (https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html),
The AWS Identity and Access Management user guide titled Data protection in AWS Identity and Access Management (https://docs.aws.amazon.com/IAM/latest/UserGuide/data-protection.html)
As a machine learning specialist in a banking firm working in the credit card processing division, it is essential to ensure that personally identifiable information (PII) remains encrypted, and customer credit card information is not compromised.
Out of the given options, the best approach to ensuring PII data remains encrypted and credit card information is not compromised is A. Encrypt the data on S3 and SageMaker using KMS, obfuscate the credit card information from the customer data with a Glue ETL job.
Here's a detailed explanation for this approach:
Encrypt the data on S3 and SageMaker using KMS: S3 is a secure object storage service provided by AWS. It allows you to store and retrieve data securely. You can enable server-side encryption for your S3 bucket using AWS KMS (Key Management Service). KMS allows you to create and manage cryptographic keys that can be used to encrypt and decrypt data. By encrypting the data on S3 using KMS, you can ensure that the PII data remains encrypted.
Obfuscate the credit card information from the customer data with a Glue ETL job: AWS Glue is a fully-managed extract, transform, and load (ETL) service that makes it easy to move data between data stores. In this approach, a Glue ETL job can be used to obfuscate the credit card information from the customer data. Obfuscation is the process of masking or hiding information so that it is not easily recognizable. By obfuscating the credit card information, you can ensure that the credit card information is not compromised.
By using this approach, the PII data will remain encrypted and the credit card information will be obfuscated, ensuring the security and privacy of customer data.
Option B, E, and F are not the best approach to ensuring PII data remains encrypted and credit card information is not compromised because:
B. Encrypt the data using a SageMaker lifecycle configuration once the data is copied to the SageMaker instance in a VP: While encrypting the data using a SageMaker lifecycle configuration can provide security, it does not address the PII data that may be stored in other locations such as S3.
E. Encrypt the data using a custom-coded encryption algorithm and store the data on a SageMaker instance in a VP: Custom-coded encryption algorithms can be risky because they may contain vulnerabilities. It is better to use AWS-managed encryption services like KMS.
F. Fabricate new credit card numbers using the SageMaker DeepAR built-in algorithm, replacing the customer credit card numbers: Fabricating new credit card numbers can be illegal and unethical. It is not a valid approach to ensuring the security and privacy of customer data.