Question 76 of 170 from exam DP-200: Implementing an Azure Data Solution

Prev Question Next Question

Question

You need to develop a pipeline for processing data. The pipeline must meet the following requirements:

-> Scale up and down resources for cost reduction

-> Use an in-memory data processing engine to speed up ETL and machine learning operations.

-> Use streaming capabilities

-> Provide the ability to code in SQL, Python, Scala, and R

Integrate workspace collaboration with Git

What should you use?

Answers

A. HDInsight Spark Cluster

B. Azure Stream Analytics

C. HDInsight Hadoop Cluster

D. Azure SQL Data Warehouse

E. HDInsight Kafka Cluster

F. HDInsight Storm Cluster

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications.

HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP,

MapReduce.

Languages: R, Python, Java, Scala, SQL

You can create an HDInsight Spark cluster using an Azure Resource Manager template. The template can be found in GitHub.

https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing

Prev Question Next Question