Question 76 of 170 from exam DP-200: Implementing an Azure Data Solution

Question 76 of 170 from exam DP-200: Implementing an Azure Data Solution

Question

You need to develop a pipeline for processing data. The pipeline must meet the following requirements:

-> Scale up and down resources for cost reduction

-> Use an in-memory data processing engine to speed up ETL and machine learning operations.

-> Use streaming capabilities

-> Provide the ability to code in SQL, Python, Scala, and R

Integrate workspace collaboration with Git

What should you use?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

A

Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications.

HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP,

MapReduce.

Languages: R, Python, Java, Scala, SQL

You can create an HDInsight Spark cluster using an Azure Resource Manager template. The template can be found in GitHub.

https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing