An Organization has an application using Amazon S3 Glacier to store large CSV objects.
While retrieving these large objects end users are observing some performance issue.
In most cases, users only need a small part of the data instead of the entire objects.
A solutions Architect has been asked to re-design this solution to improve the performance.
Which solution is the most cost-effective?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer : C.
Option A is incorrect because Athena supports standard SQL to retrieve data, but it's more used for analytical data.
Option B is incorrect because the objects are stored in Glacier.
The correct method should be Glacier Select.
Option C is CORRECT because the application needs to retrieve data from Glacier.
With Glacier Select, you can perform filtering directly against a Glacier object using standard SQL statements.
Option D is incorrect because writing custom SQL statements with S3 APIS wouldn't improve performance.
Reference:
https://aws.amazon.com/blogs/aws/s3-glacier-select/The most cost-effective solution for improving the performance of retrieving large objects from Amazon S3 Glacier, when users only need a small part of the data, is to use S3 Select.
S3 Select is an Amazon S3 feature that enables you to retrieve a subset of data from an object using simple SQL expressions. It allows you to filter and transform data within an object without having to retrieve the entire object, which can greatly improve performance.
Using AWS Athena, a serverless query service that makes it easy to analyze data in Amazon S3 using standard SQL, is another option. However, it may not be as cost-effective as S3 Select since it requires a dedicated Athena instance and incurs charges based on the amount of data scanned.
Glacier Select is also an option for retrieving only the necessary data from Glacier archives, but it incurs additional charges for data retrieval and is more suited for when you need to retrieve specific data from large archives.
Using custom SQL statements and S3 APIs to retrieve only the data that users need is another option, but it requires additional development effort and may not be as efficient as using S3 Select.
Therefore, the most cost-effective and efficient solution for improving the performance of retrieving large objects from Amazon S3 Glacier, when users only need a small part of the data, is to use S3 Select.