Azure Data Factory Configuration for Ingesting Cosmos DB Data

Configure Azure Data Factory for ingesting Cosmos DB data

Question

Your company is storing hundreds of GBs of data in a distributed Cosmos DB.

This huge amount of data contains tons of valuable information about sales transactions and the company is going to make use of it by running machine learning models against it.

Your task is to design how to feed Azure ML processes with Cosmos DB data.

You decide to use Azure Data Factory for ingesting the data.

How do you configure ADF?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect because you want to move data from Cosmos DB to Blob Storage, which means that your source is Cosmos DB and the target/sink is Blob Storage.

In addition, Copy activity will do the task instead of Custom.

Option B is CORRECT because your source of data is Cosmos DB, the target/sink is Blob Storage, the linked services are set accordingly, and - since the source is Cosmos DB, the source type in Copy Activity has to be CosmosDbSqlApiSource.

Option C is incorrect because when you need to move data between Azure cloud data sources (i.e.

Cosmos DB and Blob Storage), the Azure integration runtime must be used, i.e.

setting ‘Self hosted' is incorrect.

Option D is incorrect because when you need to move data between Azure cloud data sources (i.e.

Cosmos DB and Blob Storage), the Azure integration runtime must be used, i.e.

setting ‘Azure-SSIS' is not applicable here.

Copy activity source is missing.

Reference:

Sure, I'll provide a detailed explanation for each option and explain the reason behind each configuration.

Option A: Source: Blob Storage Container Sink: Cosmos DB table Integration runtime: Azure Activity: Custom Linked service type1: CosmosDb Linked service type2: AzureBlobStorage Copy activity source type: CosmosDbSqlApiSource

In this option, the source data is taken from a Blob Storage Container, and the data is ingested into Cosmos DB table using Azure Data Factory (ADF). A custom activity is used for this purpose. This option uses two linked services, one for Cosmos DB and one for Azure Blob Storage, and the copy activity source type is CosmosDbSqlApiSource.

The reason for choosing this configuration could be that the data is initially stored in Blob Storage Container and needs to be transformed before loading into Cosmos DB table. Also, the Custom activity provides more flexibility to perform complex transformations on the data.

Option B: Source: Cosmos DB table Sink: Blob Storage Container Integration runtime: Azure Activity: Copy Linked service type1: CosmosDb Linked service type2: AzureBlobStorage Copy activity source type: CosmosDbSqlApiSource

In this option, the source data is taken from the Cosmos DB table, and the data is copied into Blob Storage Container using ADF. The integration runtime used is Azure, and the activity type is copy. Two linked services are used, one for Cosmos DB and one for Azure Blob Storage, and the copy activity source type is CosmosDbSqlApiSource.

This option could be chosen if the data in the Cosmos DB table needs to be backed up in Blob Storage Container. Also, the copy activity is simple to use and provides basic transformation capabilities.

Option C: Source: Cosmos DB table Sink: Blob Storage Container Integration runtime: Self-hosted Activity: Copy Linked service type1: AzureBlobStorage Linked service type2: CosmosDb Copy activity source type: CosmosDbSqlApiSource

In this option, the source data is taken from the Cosmos DB table, and the data is copied into Blob Storage Container using ADF. The integration runtime used is self-hosted, and the activity type is copy. Two linked services are used, one for Azure Blob Storage and one for Cosmos DB, and the copy activity source type is CosmosDbSqlApiSource.

This option could be chosen if the company has a self-hosted environment, and the data transfer is required to take place within that environment. The self-hosted integration runtime can be used to securely transfer data between the two linked services.

Option D: Source: Cosmos DB table Sink: Blob Storage Container Integration runtime: Azure-SSIS Activity: Copy Linked service type1: AzureBlobStorage Linked service type2: CosmosDb

In this option, the source data is taken from the Cosmos DB table, and the data is copied into Blob Storage Container using ADF. The integration runtime used is Azure-SSIS, and the activity type is copy. Two linked services are used, one for Azure Blob Storage and one for Cosmos DB.

This option could be chosen if the company has an existing Azure-SSIS environment and wants to use it for the data transfer. The Azure-SSIS integration runtime can be used to perform complex data transformations and load data into Blob Storage Container.

In conclusion, option A and B both use Azure as the integration runtime and are the most suitable options for ingesting data from Cosmos DB into Azure ML processes using Azure Data Factory. Option A may be preferred if additional data transformations are required, while option B may be preferred if a simple data copy is all that's needed. Option C and D may be suitable in specific situations, but are less likely to be the best options