MSP Bank, Limited is a leading varied Japanese monetary institution that provides a full range of financial products and services to both institutional and individual customers.
It is headquartered in Tokyo.
MSP Bank is hosting their existing infrastructure on on premise DC and AWS and maintains a hybrid environment. MSP Bank hosts multiple web applications, CRM and ERP running on premise while moving storage, compute, DWH and AI running out of AWS.
Also MSP is launching new applications running on AWS environment.
MSP Banks hosts their Development, Testing and Production VPC to maintain different environments and maintains VPN connectivity between on premise DC and AWS. MSP Bank is planning to build a data lake on all the log files stored in S3, captured from different applications running out of on premise and AWS and also identified data sets captured out of CRM, ERP and other Business applications
MSP Bank is looking at AWS Glue to acts as a fully managed ETL service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.
What kind of tasks are supported by AWS Glue?Select 4 options.
Click on the arrows to vote for the correct answer
A. B. C. D. E. F.Answer: A, B, D, F.
AWS Glue simplifies many tasks when you are building a data warehouse:
Discovers and catalogs metadata about your data stores into a central catalog.
You can process semi-structured data, such as clickstream or process logs.
Populates the AWS Glue Data Catalog with table definitions from scheduled crawler programs.
Crawlers call classifier logic to infer the schema, format, and data types of your data.
This metadata is stored as tables in the AWS Glue Data Catalog and used in the authoring process of your ETL jobs.
Generates ETL scripts to transform, flatten, and enrich your data from source to target.
Detects schema changes and adapts based on your preferences.
Triggers your ETL jobs based on a schedule or event.
You can initiate jobs automatically to move your data into your data warehouse.
Triggers can be used to create a dependency flow between jobs.
Gathers runtime metrics to monitor the activities of your data warehouse.
Handles errors and retries automatically.
Scales resources, as needed, to run your jobs.
https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html#when-to-use-glueAWS Glue is a fully managed ETL service that helps in categorizing and cleaning data, enriching it, and moving it reliably between various data stores. AWS Glue helps in discovering and cataloging metadata about data stores in the AWS Glue catalog, populating the AWS Glue Data Catalog with table definitions from scheduled crawler programs that use classifier logic to infer the schema, format, and data types of the data, and generating ETL scripts based on Python or Scala to transform, flatten, and enrich data from source to target.
The following are the tasks that are supported by AWS Glue:
A. Discovers and catalogs metadata about data stores into AWS Glue catalog: AWS Glue can discover and catalog metadata about data stores into the AWS Glue catalog, including databases, tables, and columns. This metadata can be used for searching and analyzing data, and for building and executing ETL jobs.
B. Populates the AWS Glue Data Catalog with table definitions from scheduled crawler programs which classifier logic to infer the schema, format, and data types of the data: AWS Glue can populate the AWS Glue Data Catalog with table definitions from scheduled crawler programs that use classifier logic to infer the schema, format, and data types of the data. This helps in automating the process of creating table definitions and reducing the time and effort required to create and maintain them manually.
C. Real-time and streaming data ingestion and integration: AWS Glue can handle real-time and streaming data ingestion and integration, allowing data to be continuously processed and transformed as it is generated. This helps in building real-time data processing pipelines and reducing the latency between data generation and processing.
D. Generates ETL scripts based on python, scala to transform, flatten, and enrich data from source to target: AWS Glue can generate ETL scripts based on Python or Scala to transform, flatten, and enrich data from source to target. These scripts can be customized and modified as needed to meet specific data transformation requirements.
E. Generates ETL scripts based on node.js, Shell Scripting to transform, flatten, and enrich data from source to target: AWS Glue can also generate ETL scripts based on Node.js or Shell Scripting to transform, flatten, and enrich data from source to target. These scripts can be customized and modified as needed to meet specific data transformation requirements.
F. Triggers ETL jobs based on a schedule or event and scales resources, as needed, to run jobs: AWS Glue can trigger ETL jobs based on a schedule or event and can scale resources as needed to run jobs efficiently. This helps in automating the ETL process and reducing the time and effort required to manage and monitor ETL jobs manually.
In summary, AWS Glue supports various tasks related to data transformation, ingestion, and integration, including discovering and cataloging metadata about data stores, populating the AWS Glue Data Catalog with table definitions, handling real-time and streaming data, generating ETL scripts in Python, Scala, Node.js, or Shell Scripting, and triggering ETL jobs based on a schedule or event while scaling resources efficiently.