AWS EMR Hadoop Cluster: Encrypting Data in Transit | Exam BDS-C00

Encrypting Data in Transit within an AWS EMR Hadoop Cluster

Question

A company currently has a Hadoop Cluster setup using the AWS EMR service.

This is being used to host several tables on which python jobs are run for processing the data.

Recently the IT security department have mandated that all data is encrypted in transit within the Hadoop Cluster.

Which of the following can be used to fulfil this requirement? Choose 2 answers from the options given below.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A and D.

The AWS Documentation mentions the following.

Several encryption mechanisms are enabled with in-transit encryption.

These are open-source features, are application-specific, and may vary by Amazon EMR release.

The following application-specific encryption features can be enabled using security configurations:

Hadoop (for more information, see Hadoop in Secure Mode in Apache Hadoop documentation):

Hadoop MapReduce Encrypted Shuffle uses TLS.

Secure Hadoop RPC is set to "Privacy" and uses SASL (activated in Amazon EMR when at-rest encryption is enabled).

Data encryption on HDFS block data transfer uses AES 256 (activated in Amazon EMR when at-rest encryption is enabled in the security configuration).

Option B is incorrect since this is normally used when a service in the private subnet needs to access a public AWS service without the traffic moving over the Internet.

Option C is incorrect since IAM Roles are used to give access to other AWS services.

For more information on Data Encryption options in EMR, please refer to the below URL.

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-data-encryption-options.html

To fulfill the requirement of encrypting data in transit within the Hadoop cluster, we can use two options - Hadoop MapReduce Encrypted and Secure Hadoop RPC set to "Privacy" and use SASL. Let's discuss each of these options in detail:

Option A: Hadoop MapReduce Encrypted Hadoop MapReduce Encrypted is a feature that allows encryption of data between MapReduce components, such as between the mapper and reducer nodes. It can be used to secure the data in transit within the Hadoop cluster. This encryption can be configured using SSL/TLS protocols. However, this feature only encrypts the data in transit between MapReduce components, not the data at rest or the data flowing into or out of the Hadoop cluster.

Option D: Secure Hadoop RPC is set to "Privacy" and use SASL Secure Hadoop RPC is a mechanism that can be used to secure the communication between different Hadoop components using SASL (Simple Authentication and Security Layer) protocol. By setting Secure Hadoop RPC to "Privacy", the communication between Hadoop components can be encrypted to ensure data security in transit within the Hadoop cluster.

Option B: VPC Endpoints VPC (Virtual Private Cloud) endpoints enable private communication between an Amazon VPC and another AWS service without using an Internet Gateway, NAT device, VPN connection, or AWS Direct Connect. While VPC endpoints can help secure the communication between the Hadoop cluster and other AWS services, they do not provide encryption of data in transit within the Hadoop cluster.

Option C: Modify the EC2 IAM Roles Modifying the EC2 IAM Roles does not help with encrypting data in transit within the Hadoop cluster. IAM roles are used to manage access to AWS resources and services, and do not provide encryption.

Therefore, the correct options to fulfill the requirement of encrypting data in transit within the Hadoop cluster are Option A (Hadoop MapReduce Encrypted) and Option D (Secure Hadoop RPC set to "Privacy" and use SASL).