EMR Hadoop Ecosystem for Non-Relational Database on HDFS | BDS-C00 Exam Answer

EMR Hadoop Ecosystem for Non-Relational Database on HDFS

Prev Question Next Question

Question

Allianz Financial Services (AFS) is a banking group offering end-to-end banking and financial solutions in South East Asia through its consumer banking, business banking, Islamic banking, investment finance and stock broking businesses as well as unit trust and asset administration, having served the financial community over the past five decades. AFS launched EMR cluster to support their big data analytics requirements.

AFS is looking at a non-relational database that runs on top of Hadoop Distributed File System (HDFS) to provide non-relational database capabilities for the Hadoop ecosystem This supports region servers to process the data Which EMR Hadoop ecosystem fulfills the requirements? select 1 option.

Answers

A. Apache Hive

B. Apache HBase

C. Apache HCatalog

D. Apache Phoenix.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer : B.

Option A is incorrect -Hive is an open-source, data warehouse, and analytic package that runs on top of a Hadoop cluster.

Hive scripts use an SQL-like language called Hive QL (query language) that abstracts programming models and supports typical data warehouse interactions.

Hive enables you to avoid the complexities of writing Tez jobs based on directed acyclic graphs (DAGs) or MapReduce programs in a lower level computer language, such as Java.

Hive extends the SQL paradigm by including serialization formats.

You can also customize query processing by creating table schema that matches your data, without touching the data itself.

In contrast to SQL (which only supports primitive value types such as dates, numbers, and strings), values in Hive tables are structured elements, such as JSON objects, any user-defined data type, or any function written in Java.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive.html

Option B is correct -HBase is an open source, non-relational, distributed database developed as part of the Apache Software Foundation's Hadoop project.

HBase runs on top of Hadoop Distributed File System (HDFS) to provide non- relational database capabilities for the Hadoop ecosystem.

HBase works seamlessly with Hadoop, sharing its file system and serving as a direct input and output to the MapReduce framework and execution engine.

HBase also integrates with Apache Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support for Java Database Connectivity (JDBC)

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase.html

Option C is incorrect -HCatalog is a tool that allows you to access Hive metastore tables within Pig, Spark SQL, and/or custom MapReduce applications.

HCatalog has a REST interface and command line client that allows you to create tables or do other operations.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hcatalog.html

Option D is incorrect - Apache Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-phoenix.html

Based on the requirements mentioned in the question, AFS is looking for a non-relational database that runs on top of Hadoop Distributed File System (HDFS) to provide non-relational database capabilities for the Hadoop ecosystem. This indicates that they need a NoSQL database that can store and process large amounts of unstructured or semi-structured data on HDFS.

Among the options provided, Apache HBase is the suitable choice as it is a NoSQL database that runs on top of Hadoop Distributed File System (HDFS) and provides non-relational database capabilities for the Hadoop ecosystem. It is designed for real-time read/write access to large datasets and is highly scalable.

Apache Hive is a data warehousing tool that enables SQL-like queries on Hadoop data. It is not a NoSQL database and does not provide non-relational database capabilities.

Apache HCatalog is a table and storage management layer for Hadoop that enables users to share data between different data processing tools such as Pig, Hive, and MapReduce. It is not a NoSQL database and does not provide non-relational database capabilities.

Apache Phoenix is a SQL skin for Apache HBase that allows you to use SQL queries on HBase tables. It is not a NoSQL database and does not provide non-relational database capabilities for Hadoop ecosystem, but it can be used with HBase to provide a SQL interface to HBase data.

Therefore, the correct answer is B. Apache HBase.

Prev Question Next Question