EBS Storage Types for High-Performance Big Data Workloads on AWS

Choose the Right EBS Storage Type for Your Big Data Project

Prev Question Next Question

Question

Your organization is planning to build a BigData project on AWS.

They need high data transfer rates for huge workloads to stream through with better performance.They are also looking for a cost-effective solution.

Which EBS storage type would you choose in this scenario?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Amazon EBS provides the following volume types, which differ in performance characteristics and price, so that you can tailor your storage performance and cost to the needs of your applications.

The volumes types fall into two categories:

SSD-backed volumes optimized for transactional workloads involving frequent read/write operations with small I/O size, where the dominant performance attribute is IOPS.

HDD-backed volumes optimized for large streaming workloads where throughput (measured in MiB/s) is a better performance measure than IOPS.

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/EBSVolumeTypes.html
Solid-State Drives (SSD)

Volume Type General Purpose SSD (gp2)*

Description General purpose SSD volume

that balances price and
performance for a wide variety
of workloads

Use Cases * Recommended for most
workloads
* System boot volumes
* Virtual desktops
* Low-latency interactive apps
© Development and test
environments

API Name gp2
Volume Size 1 GiB - 16 TiB
Max. 1OPS**/Volume 10,000

Max. 160 MiB/s****
Throughput/Volume

Max. IOPS/Instance 80,000

Max. 1,750 MiB/s
Throughput/Instancett

Dominant Performance |OPS
Attribute

Provisioned IOPS SSD (io1)

Highest-performance SSD volume for mission-
critical low-latency or high-throughput

workloads

* Critical business applications that require
sustained IOPS performance, or more than
10,000 1OPS or 160 MiB/s of throughput per

volume

* Large database workloads, such as:

© MongoDB

© Cassandra
Microsoft SQL Server
Mysql

PostgreSQL

Oracle

iol

4GiB - 16 TiB
32,000*#*
500 MiB/st

80,000
1,750 MiB/s

1OPS

Hard disk Drives (HDD)

Throughput Optimized HDD
(st1)

Low cost HDD volume
designed for frequently

accessed, throughput

intensive workloads

* Streaming workloads
requiring consistent, fast
throughput at a low price

Big data
* Data warehouses

© Log processing
© Cannot be a boot volume

stl
500 GiB - 16 TiB
500

500 MiB/s

80,000
1,750 MiB/s

MiB/s

Cold HDD (sc1)

Lowest cost HDD volume
designed for less frequently
accessed workloads

© Throughput-oriented
storage for large volumes
of data that is
infrequently accessed

* Scenarios where the
lowest storage cost is
important

* Cannot be a boot volume

scl
500 GiB - 16 TiB
250

250 MiB/s

80,000
1,750 MiB/s

MiB/s

For a BigData project on AWS that requires high data transfer rates for huge workloads with better performance, and cost-effectiveness, the recommended EBS storage type would be the "Throughput Optimized HDD" (C).

Throughput Optimized HDDs are designed to provide low-cost storage with high throughput and are optimized for frequently accessed, large streaming workloads. This storage type is ideal for BigData workloads such as log processing, data warehousing, and big data analytics that require high-performance and low-cost storage.

The "General Purpose SSD" (A) is best suited for small to medium-sized workloads with moderate I/O requirements. Provisioned IOPS SSD (B) is designed for workloads that require consistent and high IOPS performance, such as large databases, NoSQL databases, and transactional workloads. These two options may not be cost-effective for the BigData project since they are more expensive than the Throughput Optimized HDD.

"Cold HDD" (D) is designed for infrequent access and long-term storage, making it a poor choice for BigData workloads that require frequent access and high-performance storage.

Therefore, the Throughput Optimized HDD is the most suitable and cost-effective option for a BigData project with high data transfer rates, huge workloads, and better performance.