You have an Azure virtual machine named VM1 that runs Windows Server 2019 and contains 500 GB of data files.
You are designing a solution that will use Azure Data Factory to transform the data files, and then load the files to Azure Data Lake Storage.
What should you deploy on VM1 to support the design?
Click on the arrows to vote for the correct answer
A. B. C. D.D
The integration runtime (IR) is the compute infrastructure that Azure Data Factory uses to provide data-integration capabilities across different network environments. For details about IR, see Integration runtime overview.
A self-hosted integration runtime can run copy activities between a cloud data store and a data store in a private network. It also can dispatch transform activities against compute resources in an on-premises network or an Azure virtual network. The installation of a self-hosted integration runtime needs an on-premises machine or a virtual machine inside a private network.
https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtimeThe correct answer is option D: the self-hosted integration runtime.
Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and orchestrate data pipelines to move and transform data. Data Factory supports various data stores, including Azure Blob Storage, Azure Data Lake Storage, and on-premises data stores. You can use Data Factory to copy data from one data store to another, transform the data using Data Flow, and process the data using Azure Databricks.
To move data from an on-premises data store to a cloud-based data store using Data Factory, you need to deploy a self-hosted integration runtime on a local machine or virtual machine in your on-premises environment. The self-hosted integration runtime provides a secure communication channel between your on-premises data store and the cloud-based data store. The self-hosted integration runtime is a lightweight component that you can install on a machine that has access to the on-premises data store.
In this scenario, VM1 is an Azure virtual machine that runs Windows Server 2019 and contains 500 GB of data files. To transform the data files using Data Factory and load them to Azure Data Lake Storage, you need to deploy a self-hosted integration runtime on VM1. The self-hosted integration runtime will provide a secure connection between VM1 and Azure Data Lake Storage, allowing you to move and transform the data files.
Option A, the Azure Pipelines agent, is used to build, test, and deploy code using Azure Pipelines. This option is not relevant to the scenario.
Option B, the Azure File Sync agent, is used to synchronize files between on-premises file servers and Azure Files. This option is not relevant to the scenario.
Option C, the On-premises data gateway, is used to connect on-premises data sources to Power BI, Power Apps, and Azure Logic Apps. This option is not relevant to the scenario.