Azure Data Factory

Azure Databricks

Question

You have a requirement to use a data processing pipeline to process documents uploaded to Azure Blob Storage.

Based on the solution requirements, you should be able to read the layout, extract the data in .csv format and execute reports on that data using Power BI.

Assuming that you have Visual code with required extensions, python and storage explorer are installed.

Which of the Azure services would you use in this solution other than Azure Blob Storage, Azure Table Storage and Power BI? (choose two options).

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Correct Answers: C and D.

Option A is incorrect because Azure databricks are used for big data functions.

Given the scenario, the solution cannot use this service to meet the requirements.

Option B is incorrect because Azure synapse is a data warehouse solution.

It is not capable of recognizing the layout and extract data from the documents.

You would need to form a recognizer service to meet the solution requirement.

Option C is correct because the Azure function would be triggered when a file is uploaded into the blob storage.

That way, the documents can be parsed to the Form Recognizer API.

Option D is correct.

Here are the steps involved in the process:

Create an Azure Storage account.

Create an Azure Functions project.

Extract layout data from uploaded forms.

Upload layout data to Azure Storage.

Option E is incorrect because the Azure data factory is a fully managed data integration solution for ingestion and transformation.

However, it cannot read the document layout and extract the data from documents in the Blob Storage.

Reference:

To learn more about using the Azure function to process stored documents, use the link given below:

Based on the given requirements, we need to process documents uploaded to Azure Blob Storage, extract data in .csv format, and execute reports on that data using Power BI. We need to choose two Azure services other than Azure Blob Storage, Azure Table Storage, and Power BI that can help us accomplish these requirements.

Option A: Azure Databricks Azure Databricks is an Apache Spark-based analytics platform that provides a collaborative, cloud-based environment for data scientists, engineers, and analysts. It offers a fully managed, scalable, and secure platform that can be used to process large amounts of data, including unstructured data such as text, images, and videos. We can use Azure Databricks to process the documents uploaded to Azure Blob Storage and extract data in .csv format. However, we may need additional steps to execute reports on that data using Power BI.

Option B: Azure Synapse Azure Synapse is an analytics service that brings together big data and data warehousing into a single service. It provides a unified experience for big data processing, data warehousing, and data integration. Azure Synapse can help us to ingest and process large amounts of data in various formats, including unstructured data. We can use Azure Synapse to process the documents uploaded to Azure Blob Storage and extract data in .csv format. We can also use Azure Synapse to execute reports on that data using Power BI.

Option C: Azure Functions Azure Functions is a serverless compute service that allows us to run event-driven code in response to various triggers, such as changes to data in Azure Blob Storage. We can use Azure Functions to automate the processing of documents uploaded to Azure Blob Storage and extract data in .csv format. However, we may need additional steps to execute reports on that data using Power BI.

Option D: Azure Form Recognizer Azure Form Recognizer is a cognitive service that uses machine learning to extract text and data from various document types, including invoices, receipts, and forms. We can use Azure Form Recognizer to extract data in .csv format from the documents uploaded to Azure Blob Storage. However, we may need additional steps to process that data and execute reports on it using Power BI.

Option E: Azure Data Factory Azure Data Factory is a cloud-based data integration service that allows us to create, schedule, and orchestrate data pipelines. We can use Azure Data Factory to create a data processing pipeline that reads the documents uploaded to Azure Blob Storage, extracts data in .csv format, and transforms that data as needed. We can also use Azure Data Factory to load that data into Power BI for reporting.

Based on the requirements, the two Azure services that we can use to accomplish the requirements are Azure Synapse and Azure Data Factory. Azure Synapse can help us to ingest and process large amounts of data in various formats, including unstructured data. We can use Azure Synapse to execute reports on the data using Power BI. Azure Data Factory can help us to create a data processing pipeline that reads the documents uploaded to Azure Blob Storage, extracts data in .csv format, and loads that data into Power BI for reporting.