A company is planning on using a plethora of AWS services such as AWS RDS and Amazon Redshift.
They need to have a unified metadata repository for all of these data sources.
Which of the following is the ideal service to use for this purpose?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer - B.
The AWS Documentation mentions the following.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.
The AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats, integrating with Amazon EMR as well as Amazon RDS, Amazon Redshift, Redshift Spectrum, Athena, and any application compatible with the Apache Hive metastore.
AWS Glue crawlers can automatically infer schema from source data in Amazon S3 and store the associated metadata in the Data Catalog.
All other options are incorrect because these cannot be used to catalog the information.
For more information on EMR Spark Glue, please refer to the below URL.
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.htmlThe ideal service to use for a unified metadata repository for multiple AWS data sources is AWS Glue.
AWS Glue is a fully-managed extract, transform, and load (ETL) service that makes it easy to move data between data stores. It is a powerful service that can be used to automate the process of discovering and cataloging metadata across various data sources, including AWS RDS and Amazon Redshift.
AWS Glue uses crawlers to automatically discover and catalog metadata from different data sources, such as databases, Amazon S3 buckets, and streaming data. Once the metadata is cataloged, it can be accessed using the Glue Data Catalog. This allows users to search for and query data across multiple data sources, making it an ideal solution for creating a unified metadata repository.
AWS Athena is a serverless query service that enables users to analyze data in Amazon S3 using SQL. While Athena can be used to query data from various data sources, it is not designed to be a metadata repository.
AWS EMR (Elastic MapReduce) is a managed Hadoop framework that can be used to process large datasets using Apache Spark, Apache Hadoop, and other big data technologies. While EMR can be used to process data from various data sources, it is not designed to be a metadata repository.
AWS QuickSight is a cloud-based business intelligence service that allows users to visualize and analyze data from various data sources. While QuickSight can be used to visualize data from various data sources, it is not designed to be a metadata repository.
Therefore, the correct answer is B. AWS Glue.