Load Data Into Cosmos DB with ADF


Load Data Into Cosmos DB with ADF

In this lab, you will populate an Azure Cosmos DB container from an existing set of data using tools built in to Azure. After importing, you will use the Azure portal to view your imported data.

If you have not already completed setup for the lab content see the instructions for Account Setup before starting this lab. This will create an Azure Cosmos DB database and container that you will use throughout the lab. You will also use an Azure Data Factory (ADF) resource to import existing data into your container.

Create Azure Cosmos DB Database and Container

You will now create a database and container within your Azure Cosmos DB account.

  1. Navigate to the Azure Portal

  2. On the left side of the portal, select the Resource groups link.

    Resource groups is highlighted

  3. In the Resource groups blade, locate and select the cosmoslabs resource group.

    The cosmoslabs resource group is highlighted

  4. In the cosmoslabs blade, select the Azure Cosmos DB account you recently created.

    The Cosmos DB resource is highlighted

  5. In the Azure Cosmos DB blade, locate and select the Overview link on the left side of the blade. At the top select the Add Container button.

    Add container link is highlighted

  6. In the Add Container popup, perform the following actions:

    1. In the Database id field, select the Create new option and enter the value ImportDatabase.

    2. Do not check the Provision database throughput option.

      Provisioning throughput for a database allows you to share the throughput among all the containers that belong to that database. Within an Azure Cosmos DB database, you can have a set of containers which shares the throughput as well as containers, which have dedicated throughput.

    3. In the Container Id field, enter the value FoodCollection.

    4. In the Partition key field, enter the value /foodGroup.

    5. In the Throughput field, enter the value 11000. Note: we will reduce this to 400 RU/s after the data has been imported

    6. Select the OK button.

  7. Wait for the creation of the new database and container to finish before moving on with this lab.

Import Lab Data Into Container

You will use Azure Data Factory (ADF) to import the JSON array stored in the nutrition.json file from Azure Blob Storage.

You do not need to do Steps 1-4 in this section and can proceed to Step 4 by opening your Data Factory (named importNutritionData with a random number suffix)if you are completing the lab through Microsoft Hands-on Labs or ran the setup script, you can use the pre-created Data Factory within your resource group.

  1. On the left side of the portal, select the Resource groups link.

    To learn more about copying data to Cosmos DB with ADF, please read ADF’s documentation.

    Resource groups link is highlighted

  2. In the Resource groups blade, locate and select the cosmoslabs resource group.

  3. If you see a Data Factory resource, you can skip to step 4, otherwise select Add to add a new resource

    A data factory resource is highlighted

    Select Add in the nav bar

    • Search for Data Factory and select it.
    • Create a new Data Factory. You should name this data factory importnutritiondata with a unique number appended and select the relevant Azure subscription. You should ensure your existing cosmoslabs resource group is selected as well as a Version V2.
    • Select East US as the region. Do not select Enable GIT (this may be checked by default).
    • Select Create.

      The new data factory dialog is displayed

  4. After creation, open your newly created Data Factory. Select Author & Monitor and you will launch ADF.

    The overview blade is displayed for ADF

  5. Select Copy Data.

    • We will be using ADF for a one-time copy of data from a source JSON file on Azure Blob Storage to a database in Cosmos DB’s SQL API. ADF can also be used for more frequent data transfers from Cosmos DB to other data stores.

    The main workspace page is displayed for ADF

  6. Edit basic properties for this data copy. You should name the task ImportNutrition and select to Run once now, then select Next

    The copy data activity properties dialog is displayed

  7. Create a new connection and select Azure Blob Storage. We will import data from a json file on Azure Blob Storage. In addition to Blob Storage, you can use ADF to migrate from a wide variety of sources. We will not cover migration from these sources in this tutorial.

    Create new connection link is highlighted

    Azure Blog Storage is highlighted

  8. Name the source NutritionJson and select SAS URI as the Authentication method. Please use the following SAS URI for read-only access to this Blob Storage container:

    https://cosmoslabsstorageaccount.blob.core.windows.net/nutrition-data?si=container-list-read-policy&spr=https&sv=2021-06-08&sr=c&sig=jGrmrokYikbgbuW9we2am%2BwAq%2BC%2BxfZcPYswOeSQpAU%3D

    The New linked service dialog is displayed

  9. Select Create
  10. Select Next
  11. In the File or Folder textbox, enter the folder name as nutirion-data and then click on Browse to select the nutrition-data folder. Finally select NutritionData.json file.

    The nutritiiondata folder is displayed

  12. Un-check Copy file recursively or Binary Copy if they are checked. Also ensure that other fields are empty. Click Next

    The input file or folder dialog is displayed

  13. Select the file format as JSON format. Then select Next.

    "The file format settings dialog is displayed"

  14. You have now successfully connected the Blob Storage container with the nutrition.json file as the source.

  15. For the Destination data store add the Cosmos DB target data store by selecting Create new connection and selecting Azure Cosmos DB (SQL API).

    "The New Linked Service dialog is displayed"

  16. Name the linked service targetcosmosdb and select your Azure subscription and Cosmos DB account. You should also select the Cosmos DB ImportDatabase that you created earlier.

    "The linked service configuration dialog is displayed"

  17. Select your newly created targetcosmosdb connection as the Destination data store.

    "The destination data source dialog is displayed"

  18. Select your FoodCollection container from the drop-down menu. You will map your Blob storage file to the correct Cosmos DB container. Select Next to continue.

    "The table mapping dialog is displayed"

  19. There is no need to change any Settings. Select next.

    "The settings dialog is displayed"

  20. Select Next to begin deployment After deployment is complete, select Monitor.

    "The pipeline runs are displayed"

  21. After a few minutes, refresh the page and the status for the ImportNutrition pipeline should be listed as Succeeded.

    "The pipeline runs are displayed"

  22. Once the import process has completed, close the ADF. You will now proceed to validate your imported data.

Validate Imported Data

The Azure Cosmos DB Data Explorer allows you to view documents and run queries directly within the Azure Portal. In this exercise, you will use the Data Explorer to view the data stored in our container.

You will validate that the data was successfully imported into your container using the Items view in the Data Explorer.

  1. Return to the Azure Portal (http://portal.azure.com).

  2. On the left side of the portal, select the Resource groups link.

    Resource groups link is highlighted

  3. In the Resource groups blade, locate and select the cosmoslabs resource group.

    The Lab resource group is highlighted

  4. In the cosmoslabs blade, select the Azure Cosmos DB account you recently created.

    The Cosmos DB resource is highlighted

  5. In the Azure Cosmos DB blade, locate and select the Data Explorer link on the left side of the blade.

    The Data Explorer link was selected and is blade is displayed

  6. In the Data Explorer section, expand the ImportDatabase database node and then expand the FoodCollection container node.

    The Container node is displayed

  7. Within the FoodCollection node, select the Scale and Settings link to view the throughput for the container. Reduce the throughput to 400 RU/s.

    Scale and Settings

  8. Within the FoodCollection node, select the Items link to view a subset of the various documents in the container. Select a few of the documents and observe the properties and structure of the documents.

    Items is highlighted

    An Example document is displayed

If this is your final lab, follow the steps in Removing Lab Assets to remove all lab resources.