Skip to main content

Azure - ADLS

info

Connect to data in Azure Data Lake Storage (ADLS) from Databricks.

Youtube Walkthrough


Step-by-step guide

How to Create a Storage Credential in Azure

Step 1: Create New Access Connector

  1. Navigate to the Azure portal
  2. In the search bar, type Access Connector for Azure Databricks
  3. Select "Access Connector for Azure Databricks" from the search results
Azure Access Connector Searchbar

  1. Click the "Create" button to start the Access Connector creation process
Create Access Connector button

  1. Complete the configuration form with the following required information:

    • Subscription: Select the Azure subscription where the access connector will be deployed
    • Resource group: Choose the resource group for the access connector
    • Name: Provide a descriptive name for the access connector
    • Region: Select the same region as your Databricks workspace for optimal performance
  2. Click "Review + create" to validate your configuration

Create Access Connector Button

  1. After validation completes successfully, click "Create" to deploy the access connector
Create Access Connector Button

  1. Once deployment is complete, navigate to the newly created Access Connector resource
  2. Copy the Resource ID from the resource overview page (you'll need this for Databricks configuration)
Create Access Connector Button

Step 2: Configure storage credential in Databricks

  1. Open your Databricks workspace and navigate to the "Catalog" section from the left sidebar
Databricks Catalog Menu

  1. Click on "External Data" to access external data configuration options
External Data Menu

  1. Navigate to the "Credentials" tab and click "Create credential"
Create Credential Button

  1. Complete the credential configuration form:

    • Credential Name: Provide a descriptive name for the storage credential
    • Authentication Type: Select "Azure Managed Identity"
    • Access Connector ID: Paste the Resource ID copied from Step 4
    • Description: Optional description for documentation purposes
  2. Click "Create" to establish the storage credential

External Locations

An external location defines a secure path to your data stored in Azure cloud object storage. It consists of three components: storage account, container, and folder path.

Key Concepts

  • Multiple locations: You can configure multiple external locations within your metastore
  • Granular permissions: Each external location can have different access permissions
  • Data isolation: External locations enable you to organize data by environment, business unit, or region

Storage Account Requirements

Your Azure storage account must meet these requirements:

  • Hierarchical namespace: Must be enabled
  • Azure Data Lake Storage Gen2: Required for Unity Catalog integration

Planning Your Storage Strategy

You have two options for storage accounts:

  1. Use existing storage account: If you already have a compliant storage account
  2. Create new storage account: Recommended for new implementations or specific isolation requirements

How to Create a New Storage Account

  1. Navigate to the Azure portal
  2. In the search bar, type Storage Account
  3. Select "Storage Account" from the search results
  4. Click "Create" to begin the storage account creation process
Create Access Connector Button

  1. Complete the Basics configuration with the following settings:

    • Subscription: Select the same subscription as your Databricks workspace
    • Resource group: Choose an appropriate resource group (preferably the same as your workspace)
    • Storage account name: Provide a globally unique name (lowercase letters and numbers only)
    • Region: Important - Select the same region as your Databricks workspace for optimal performance
    • Performance: Standard or Premium (Standard is sufficient for most use cases)
    • Redundancy: Choose based on your data durability requirements (LRS, ZRS, GRS, or GZRS)
  2. Click "Next" to proceed to advanced settings

Create Access Connector Button

  1. In the Advanced settings tab, configure the following critical settings:
    • Hierarchical namespace: Enable this option (required for Unity Catalog)
    • Access tier: Hot (recommended for frequently accessed data)
Storage Account Advanced Settings

  1. Review your configuration and click "Create" to deploy the storage account

  2. Once deployment completes, navigate to your new storage account resource

  3. Create a container for your data organization following the steps below:

Select Container Option

  1. Navigate into your newly created container
  2. Click "+ Add Directory" to create organizational folders for your data and Specify a directory name that reflects your data organization strategy (e.g., "bronze", "silver", "gold" for medallion architecture)
Add Directory Option

Assign Permissions to Access Connector

Important Security Step

This step grants your Databricks Access Connector the necessary permissions to read and write data in your storage account.

Access Storage Account IAM Settings

  1. In the Azure portal, navigate to the storage account you created in the previous section
  2. In the left sidebar, click on "Access control (IAM)"
  3. Click "+ Add" and then select "Add role assignment"
Storage Account IAM Settings

Select Storage Blob Data Contributor Role

  1. In the role assignment wizard:
    • Search for "Storage Blob Data Contributor" in the role search bar
    • Select this role and click "Next"
Storage Blob Data Contributor Role

Assign Role to Access Connector

  1. In the Members section:
    • Select "Managed identity" as the assignment type
    • Click "+ Select members"
    • Search for and select your Access Connector
    • Click "Select" and then "Review + assign"
Add Member to Role Assignment

How to Create External Locations in Databricks

Access External Data Configuration

  1. Open your Databricks workspace
  2. Navigate to "Catalog" in the left sidebar
  3. Click on "External Data" to access external location management
External Data Menu

Create New External Location

  1. Navigate to the "External Locations" tab
  2. Click "Create external location" to begin the configuration process

Create External Location Button

Configure External Location Settings

  1. Complete the external location configuration form with the following information:

    • External Location Name: Provide a descriptive name (e.g., "raw-data-location")
    • Storage Type: Select "Azure Data Lake Storage Gen2"
    • URL: Use the format abfss://<container>@<storage_account>.dfs.core.windows.net/<folder_path>
      • Replace <container> with your container name
      • Replace <storage_account> with your storage account name
      • Replace <folder_path> with your directory path (optional)
    • Storage credential: Select the storage credential created in the previous section
    • Comments: Optional description for documentation purposes
  2. Click "Create" to establish the external location


External Location Creation Form

Verify External Location Configuration

  1. After creation, click "Test Connection" to verify that the external location is configured correctly and accessible
Test Connection Button

Private Endpoint and Network Connectivity Setup for ADLS Gen2