Skip to main content

AWS - S3

info
  • Connect to data in Amazon S3 from Databricks.
  • The official guide can be found in the last section.

Pre-requisites

  • Permissions to run the AWS CloudFormation template.
  • Permissions to create roles and policies in AWS.

Youtube Walkthrough


Step-by-step guide

How to Create an S3 Bucket in the AWS Console

  1. Navigate to the AWS console
  2. In the search bar, type S3
  3. Select S3 from the search results
  4. Click Create bucket
create S3 bucket

  1. Complete the configuration form with the following required information:
    • Bucket name: Provide a descriptive name for the bucket
    • Object ownership: ACLS Disable
    • Block public access: block all public access
    • Bucket versioning: Disable
    • Tags: optional but recommended
    • Default encryption: server side encryption with Amazon S3 Managed Keys (SSE-S3)
    • Bucket key: Disable
S3 bucket options part 1
S3 bucket options part 2

How to create an external location in Databricks

  1. Navigate to the Databricks workspace
  2. In the left sidebar, click on "Catalog" and then click on "External Data"
external data button

  1. Click on "Create external location"
create external location button

  1. Select the AWS QuickStart and click on "Next"

quickstart option

  1. Fill the bucket name. (s3://<existing-bucket-name>).
  2. Click on Generate new token.
  3. Copy the token.

quickstart form

  1. Paste the token in the "Databricks Personal Access Token" field and click on "I acknowledge that aws cloud formation might create iam resources with custom names" and click on create stack

quickstart form


quickstart form

Official Databricks guide