AWS Redshift Spectrum: Accessing Data from S3 for Analytics

How to Access Data with Redshift Spectrum

Question

Parson Fortunes Ltd is an Asian-based department store operator with an extensive network of 131 stores, spanning approximately 4.1 million square meters of retail space across cities in India, China, Vietnam, Indonesia and Myanmar. Parson built a VPC to host their entire enterprise infrastructure on cloud.

Parson has large assets of data around 20 TB's of structured data and 45 TB of unstructured data and is planning to host their data warehouse on AWS and unstructured data storage on S3

The files sent from their on premise data center are also hosted into S3 buckets.

Parson IT team is well aware of the scalability, performance of AWS services capabilities.

Parson hosts their web applications, databases and the data warehouse built on Redshift in VPC The structured, semi-structured and unstructured formats are stored in S3 in various buckets.

This data be joined and queried along with data in Redshift using Redshift Spectrum.

Also Parson Fortunes use other AWS services like Athena, and EMR.

How can this data be accessed through Redshift Spectrum? Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer : A, B, D.

Option A is correct -Redshift Spectrum accesses external databases in Athena and EMR using external schema and external tables.

https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html

Option B is correct -Amazon Redshift external schema references an external database in an external data catalog.

You can create the external database in Amazon Redshift, in Amazon Athena, or in an Apache Hive metastore, such as Amazon EMR.

https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html

Option C is incorrect - Amazon Redshift external schema references an external database in an external data catalog.

You can create the external database in Amazon Redshift, in Amazon Athena, or in an Apache Hive metastore, such as Amazon EMR.

https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html

Option D is correct -Amazon Redshift needs authorization to access the data catalog in Athena and the data files in Amazon S3 on your behalf.

To provide that authorization, you first create an AWS Identity and Access Management (IAM) role.

Then you attach the role to your cluster and provide Amazon Resource Name (ARN) for the role in the Amazon Redshift CREATE EXTERNAL SCHEMA statement.

https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html

Option E is incorrect - Amazon Redshift needs authorization to access the data catalog in Athena and the data files in Amazon S3 on your behalf.

To provide that authorization, you first create an AWS Identity and Access Management (IAM) role.

Then you attach the role to your cluster and provide Amazon Resource Name (ARN) for the role in the Amazon Redshift CREATE EXTERNAL SCHEMA statement.

https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html

Parson Fortunes Ltd is an Asian-based department store operator that has built its entire enterprise infrastructure on the cloud using Amazon Web Services (AWS). The company has a large amount of data, including around 20 TB of structured data and 45 TB of unstructured data, and is planning to host their data warehouse on AWS and store their unstructured data on S3. The files sent from their on-premises data center are also hosted into S3 buckets.

The structured, semi-structured, and unstructured data formats are stored in S3 in various buckets, and this data needs to be joined and queried along with data in Redshift using Redshift Spectrum. Additionally, Parson Fortunes uses other AWS services like Athena and EMR.

Redshift Spectrum is an AWS service that allows users to run queries against structured and unstructured data stored in S3. Redshift Spectrum extends the functionality of Amazon Redshift by enabling users to run SQL queries against both data in Redshift and data in S3. Here are three ways that data can be accessed through Redshift Spectrum:

A. Redshift Spectrum accesses external databases in Athena and EMR using external schema and external tables. This means that Redshift Spectrum can query data stored in external databases using external tables and schemas. External tables are defined in Amazon Redshift, and they reference data stored in external databases like Athena and EMR. External schemas are also defined in Amazon Redshift, and they reference external databases in an external data catalog.

B. Amazon Redshift external schema references an external database in an external data catalog. An external data catalog is a centralized metadata store that provides a unified view of data assets across an organization. With an external data catalog, users can easily discover, understand, and manage their data assets. An external schema in Redshift references an external database in an external data catalog, allowing users to query data stored in the external database using Redshift Spectrum.

C. Amazon Redshift external schema references an external database in an internal data catalog. In addition to external data catalogs, Redshift Spectrum can also access data stored in an internal data catalog. An internal data catalog is a metadata store that is built into Amazon Redshift, and it provides a unified view of all the data assets in the Redshift cluster. Like an external schema that references an external data catalog, an external schema that references an internal data catalog allows users to query data stored in the external database using Redshift Spectrum.

D. For external schemas, Amazon Redshift needs authorization to access the data catalog in Athena and the data files in Amazon S3 using IAM roles and policies. IAM (Identity and Access Management) is an AWS service that enables users to control access to AWS resources. When using Redshift Spectrum to access external data sources like Athena and S3, IAM roles and policies need to be set up to allow Redshift Spectrum to access the data. IAM roles and policies define who has access to the data and what they can do with it.

E. For external schemas, Amazon Redshift needs authorization to access the data catalog in Athena, but not the data files in Amazon S3 using IAM roles and policies. This option is incorrect because Redshift Spectrum needs authorization to access both the data catalog in Athena and the data files in Amazon S3 using IAM roles and policies.