A company wants to start using AWS Redshift for storing their existing On-premise data warehousing solution.
They need to transfer around 10 PB of data from their on-premise environment to AWS.
Which of the following would be the ideal solution to transfer the data onto AWS?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer - B.
The AWS Documentation mentions the following.
AWS provides multiple ways to move data from your datacenter to AWS.
To establish a dedicated network connection between your network and AWS, you can use AWS Direct Connect.
To move petabytes to exabytes of data to AWS using physical appliances, you can use AWS Snowball and AWS Snowmobile.
To have your on-premises applications store data directly into AWS, you can use AWS Storage Gateway.
Option A is incorrect since this should only used once the data is in AWS.
Option C is incorrect since this would not be an adequate enough connection to transfer data onto AWS.
Option D is incorrect since streaming the data would be inconvenient.
For more information on data lakes and analytics, please refer to the below URL.
https://aws.amazon.com/big-data/datalakes-and-analytics/Given the requirement to transfer a large amount of data (around 10 PB) from an on-premise environment to AWS Redshift, the ideal solution would be to use AWS Snowball device (option B).
Here's why:
Option A - Use the AWS Redshift COPY command: The AWS Redshift COPY command can be used to copy data from Amazon S3 or other supported data sources to Redshift. However, in this scenario, the data is stored on-premise, and there is no mention of it being stored in Amazon S3 or any other supported data source. Therefore, this option is not a viable solution for transferring data from on-premise to Redshift.
Option C - Use AWS VPN connections: AWS VPN connections can be used to securely connect the on-premise environment with the AWS environment. However, VPN connections are not ideal for transferring large amounts of data over a long period of time. VPN connections can be slow and can cause latency issues, which can affect the transfer of large amounts of data. Additionally, this option requires a lot of manual setup and maintenance, which can be time-consuming and complex.
Option D - Use AWS Kinesis: AWS Kinesis is a managed service that can be used to collect, process, and analyze streaming data in real-time. However, in this scenario, there is no mention of streaming data, and the requirement is to transfer a large amount of data from on-premise to Redshift. Therefore, this option is not a viable solution for transferring data from on-premise to Redshift.
Option B - Use the AWS Snowball device: AWS Snowball is a petabyte-scale data transport solution that can be used to transfer large amounts of data between on-premise environments and AWS. The Snowball device is a secure, tamper-resistant device that is shipped to the customer's location, where it can be used to transfer data to and from the device. Once the data is loaded onto the Snowball device, it can be shipped back to AWS, where the data can be loaded into Redshift. This option is ideal for transferring a large amount of data, and it is faster and more secure than using VPN connections. Additionally, the Snowball device is easy to use and requires minimal setup and maintenance.
In summary, the ideal solution for transferring 10 PB of data from an on-premise environment to AWS Redshift is to use the AWS Snowball device (option B).