Amazon S3 Setup Guide

Amazon S3

This document describes the basic setup of the Amazon S3 data source.

Note: It is recommended that you create a dedicated IAM user and set up an IAM policy.

To integrate Amazon S3 data into Panoply:

Click Data Sources in the navigation menu.
Click the Add Data Source button.
In the Data Sources search for and select Amazon S3.
Enter your S3 Bucket Name: An Amazon S3 bucket is a storage location available on Amazon Web Services' (AWS) Simple Storage Service (S3). S3 buckets are similar to file folders, which store data available to collect.
Enter your Access Key ID and Secret Access Key: Access keys are long-term credentials for an IAM user or the AWS account root user. They consist of an Access Key ID (e.g. AKIAIOSFODNN7EXAMPLE) and Secret Access Key (e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY). Like a user name and password, you must use both the access key ID and secret access key together to authenticate. You may find your access keys in the security credentials of your AWS Management Console. Learn more from Amazon here: find and update your access keys.
Enter the Path Pattern. The Path Pattern is a regular expression that Panoply uses to decide whether or not to sync certain files. It applies to everything under the prefix. For instance, suppose under the prefix logs you had three folders: 2017, 2016, and errors. Using the pattern \d\d\d\d/.\*, you could exclude all the files in the errors folder, because \d\d\d\d applies to the folders, and .* applies to the files under them. If you're not sure what regular expression to use, you can leave this field blank, and we'll sync everything under the prefix.
Collect All Files: If this checkbox is checked, Panoply collects all the files in your S3 bucket address, including any files added to the S3 bucket or modified in the future. If the box is not checked, you will have to select the files you wish to collect.
Select the files to collect (if you are not collecting all files). Click the links below to view sample files. For details regarding these file types, see the Data Dictionary. Panoply supports the following file types:
- Plain text (.txt)
- Archive (.tar, .zip and .gzip) The collection will fail if the archive file contains unsupported file types or other archive files.
- .xlsx
- json
- Delimited files (.csv, .tsv and others)
- XML
- Parquet
- Avro
Enter a Destination Table. This determines where Panoply will store the data. To collect spreadsheet tabs into separate tables, use the format destination_{__tablename} where destination is your chosen destination table name and {__tablename} is a dynamic field that instructs Panoply to use the name of the sheet (tab). For example, if you use myfile_{__tablename} as your destination table and your spreadsheet has two sheets named jan21 and feb21, the resulting tables will be named myfile_jan21 and myfile_feb21.
Click Save Changes and then Collect.
The data source appears grayed out while the collection runs.
You may add additional data sources while this collection runs.
You can monitor this collection from the Jobs page or the Data Sources page.
After a successful collection, navigate to the Tables page to review the data results.

This is all that is necessary to start collecting your data from Amazon S3, however there are a number of Advanced Settings you can use to customize your Amazon S3 data source. We do not recommend changing Advanced Settings unless you are an experienced Panoply user.