AWS Data Warehouse Migration: Performance and Cost Optimization | Parson Fortunes Ltd

Migrate Your Data Warehouse to AWS and Optimize Performance | Parson Fortunes Ltd

Question

Parson Fortunes Ltd is an Asian-based department store operator with an extensive network of 131 stores, spanning approximately 4.1 million m2 of retail space across cities in India, China, Vietnam, Indonesia and Myanmar. Parson has large assets of data around 20 TB's of structured data and 45 TB of unstructured data and is planning to host their data warehouse on AWS and unstructured data storage on S3

Parson IT team is well aware of the scalability, performance of AWS services capabilities.

Parson is currently using running their DWH, on-premises on Teradata and is concerned on the overall costs of the DWH on AWS.

They want to initially migrate the platform onto AWS to address their performance intensive workloads in place.

Initially, they want to keep a fraction (15%) of the structured data in a data warehouse storage and gradually expand to real-time data integration and data driven analytics as a road map in the next 6 months Currently the number of users accessing the application would be around 100

What is your suggestion?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer : C.

Option A is incorrect -DS2 node types are optimized for large data workloads and use hard disk drive (HDD) storage.

This is not the requirement.

https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html#rs-about-clusters-and-

Option B is incorrect -DS2 node types are optimized for large data workloads and use hard disk drive (HDD) storage.

This is not the requirement.

https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html#rs-about-clusters-and-

Option C is correct -DC2 node types are optimized for performance-intensive workloads.

DC2.large fulfills the requirements since it provides massive parallel processing using multiple nodes.

Based on the amount of data loaded, this is the right option.

https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html#rs-about-clusters-and-

Option D is incorrect -DC2 node types are optimized for performance-intensive workloads.

DC2.8xlarge does not fulfill the requirements since it can provide massive parallel processing using multiple nodes.

Since cost and performance is also a concern, this is not the right option.

https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html#rs-about-clusters-and-

Based on the given information, Parson Fortunes Ltd has a large amount of structured data and unstructured data, and they are planning to host their data warehouse on AWS while keeping a fraction (15%) of the structured data in a data warehouse storage. They want to address their performance-intensive workloads in place and have around 100 users accessing the application.

In this scenario, the best option is to launch a Redshift cluster with node types DC2.8xlarge to fulfill the requirements. Here's why:

  • Redshift is a fully managed, petabyte-scale data warehouse service in AWS, which is designed for large-scale data warehousing and analytics workloads. It offers fast query performance using columnar storage technology and massively parallel processing architecture.
  • Redshift offers multiple node types that can be chosen based on the workload requirements. Node types differ in terms of the number of CPUs, memory, and storage capacity they provide.
  • DC2 node types are the latest generation of Redshift nodes that offer high-performance SSD-based storage and enhanced networking capabilities. They are optimized for heavy analytics workloads and provide the best price-to-performance ratio for Redshift clusters.
  • DC2.8xlarge is a specific node type in the DC2 family that provides 32 vCPUs, 244 GiB of memory, and 16 TB of SSD storage. This node type is well-suited for Parson Fortunes Ltd's workload requirements, which include hosting 15% of their structured data in a data warehouse storage and gradually expanding to real-time data integration and data-driven analytics.
  • DC2.8xlarge node type is recommended as it has the capability to support up to 128 concurrent queries and can handle complex joins and aggregation queries efficiently. Additionally, it also provides high-speed data transfer and parallel processing capabilities, which can help to improve the overall performance of the workload.

Based on the above considerations, the best option for Parson Fortunes Ltd would be to launch a Redshift cluster with node types DC2.8xlarge to fulfill their requirements. This would provide them with a scalable and cost-effective solution for hosting their data warehouse and handling their performance-intensive workloads.