Amazon S3

Amazon S3

This document provides instructions for integrating Amazon S3 data into Panoply. The following items will be covered:

Amazon S3 Data Integration

To integrate Amazon S3 data into Panoply using default selections, complete the following steps. For more advanced options, complete the following and refer to the subsequent sections for detailed information.

  1. Click Data Sources in the navigation menu.
  2. Click the Add Data Source button.
  3. In the Data Sources – Choose Source Type window, select Amazon S3. Amazon S3 is listed under Most Popular and Files & Services.
  4. In the Data Sources – Amazon S3 screen, enter the address of your S3 bucket.
  5. Enter your AWS Access Key and AWS Access Secret, and then click Next.
  6. Select the files to import from your bucket.
  7. Enter a destination table name. This data source does not have a default destination.
  8. (Optional) To customize the ingestion from your data source, review the advanced options.
  9. Click Collect.

The Data Sources – Amazon S3 window will appear grayed out while the data integration is pending. A small green progress bar appears below Amazon S3.

You will be prompted to set up the integration of another data source. You can set up multiple data integrations without impacting the ingestion of the already scheduled or pending data integrations.

From the Data Sources main menu, you can monitor the data ingestion status of the scheduled and pending data integrations. After the data ingestion is complete, you can clean or transform your data in the Tables menu.

Advanced Options

Clicking Show next to Advanced will expand the Data Sources - Amazon S3 window to include Primary Key, Incremental Key, Delimiter, Exclude, Parse string, and Truncate table.

  • Primary Key - Default is id. The primary key here determines which field to use as the deduplication key when ingesting data. If no id field is present in the data source, Panoply will generate a unique id for each record. You can also set the primary key using one or more fields from your data surrounded by curly brackets. For example, {field1} or {field1)_{field2}. Read more about primary keys.

  • Delimiter - If you have a character-delimited file, indicate the delimiting character. Comma and tab delimiters are detected automatically.

FAQs

Q: How do I automatically ingest new and modified files?

To automatically ingest new files that are added to an S3 bucket and modified files, select All in the file selector when setting up your data source.

If your bucket includes files that you do not want to ingest, you can use one of these approaches:

  • Use a different S3 bucket
  • Use a different folder in the same bucket
  • Use the file selector search to identify the relevant files based on naming convention, and select All

Data Schema

The data schema in Panoply will reflect the data schema from the S3 source. Additionally, Panoply includes these metadata columns in the destination table: __senttime, __updatetime, __s3bucket, and __s3key. The last two metadata columns are unique to the Amazon S3 data source.

  • __s3bucket - The source S3 bucket for the record.
  • __s3key - The source file name in S3 for the record.