AWS - S3
info
- Connect to data in Amazon S3 from Databricks.
- The official guide can be found in the last section.
Pre-requisites
- Permissions to run the AWS CloudFormation template.
- Permissions to create roles and policies in AWS.
Youtube Walkthrough
Step-by-step guide
How to Create an S3 Bucket in the AWS Console
- Navigate to the AWS console
- In the search bar, type S3
- Select S3 from the search results
- Click Create bucket

- Complete the configuration form with the following required information:
- Bucket name: Provide a descriptive name for the bucket
- Object ownership: ACLS Disable
- Block public access: block all public access
- Bucket versioning: Disable
- Tags: optional but recommended
- Default encryption: server side encryption with Amazon S3 Managed Keys (SSE-S3)
- Bucket key: Disable


How to create an external location in Databricks
- Navigate to the Databricks workspace
- In the left sidebar, click on "Catalog" and then click on "External Data"

- Click on "Create external location"

- Select the AWS QuickStart and click on "Next"

- Fill the bucket name. (s3://
<existing-bucket-name>). - Click on Generate new token.
- Copy the token.

- Paste the token in the "Databricks Personal Access Token" field and click on "I acknowledge that aws cloud formation might create iam resources with custom names" and click on create stack

