Connecting AWS DynamoDB to Panoply
Panoply has native connectors for DynamoDB, making the connection process very streamlined. Below is a step-by-step guide; if you want to follow along, make sure you track down your AWS Access Key and AWS Access Secret before getting started. If you’ve got one you are ready to use for this, skip to Pulling data from DynamoDB into Panoply below. If you’d like a quick walkthrough on how to set up a secure connection between DynamoDB and Panoply using IAM roles, continue to the section directly below.
Creating an IAM role for Panoply
If you want to generate a set of Panoply-specific AWS Access Keys and AWS Access Secrets (recommended), head to the AWS IAM console and choose Users from the navigation pane. Next, choose Add user in the upper left (For the purposes of this guide, we’ll just create a new user on the account, but you can always add these privileges to an existing user if you’d like).
Choose a name for the new account and check the box for programmatic access so that you can generate an access key ID and secret access key.
After you’ve entered the name you’d like to set for the account, click Next: Permissions. You can now add this user to an existing group, set up a new group for the Panoply user account, or just add IAM policies directly to the account you created for this guide. We’ll choose Attach existing policies directly from the options up top. Filter the existing policies by entering DynamoDB into the search box, then select the policy for read only access to DynamoDB as pictured below:
Click Next and set tags on the user if you want, but we’ll skip that step for now, bringing us to the Review step. If everything looks right, click Create user in the bottom right.
The next page will show your access key ID and secret access key. Make sure to store them securely (i.e. by downloading as a .csv), as you won’t be able to get them from Amazon again. You can always generate new keys, but you won’t be able to recover these particular keys if you lose track of them. Your keys should look something like this:
- Access key ID: AKIAIOSFODNN7EXAMPLE
- Secret access key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Keep these keys safe, and don’t share your secret key with anyone outside your organization!
Pulling data from DynamoDB into Panoply
The first step is to connect your DynamoDB instance to Panoply. From your Panoply dashboard, click on Data Sources in the left pane, then hit Add Data Source in the upper right:
Which will take you into the data source selection section. DynamoDB will be under Databases (in the leftmost column):
Selecting DynamoDB will open a new pane, which will require you to enter your credentials:
At this point, you might also need to set up IP whitelisting, depending on how your AWS access settings are configured. Restricting the IP ranges that are allowed to connect to your AWS instances is a straightforward way of managing your databases’ security, and if you’re following standard security practices with your AWS instance, you’ll definitely need to set up IP whitelisting. Whitelisting Panoply’s IP range will add Panoply to the list of approved connections for your database.
See the link above for the IP ranges Panoply’s data connectors use, and enter those in your DynamoDB instance’s IP whitelist. When you’re all set, hit Next, choose the tables you want to import, and set any other advanced options. Note that you can take advantage of Panoply’s job parallelization features here to set up multiple, separate imports. So, for example, rather than importing every single table in one go, you might find it more efficient to set up multiple connectors for different segments of your data. Since we’re using Amazon’s movie sample data for this guide, we’ll select that in the next pane:
If you expand the Advanced options, you’ll see that you can fine-tune the data import by indicating a destination, primary key, **and incremental key**.
- Destination: allows you to give a name to the table that will be created in Panoply after importing, or a table-specific prefix if you’re planning on importing multiple tables. Learn more here
- Primary Key: allows you to set a primary key that will act as a unique identifier for each record in your table(s), and will also allow you to link related data across tables. If you set a primary key when setting up your DynamoDB instance, you can just re-use that here. This is the most important Advanced setting to pay attention to. If you already have a field in your tables called “id” or something similar, Panoply will use it as a primary key if no other parameter is set. If you don’t set a primary at all, you won’t see any deduplication in your data imports, so every single piece of data from the collection run will be added to your tables–you might not want that. See the documentation on primary keys to learn more.
- Incremental Key: make sure to set this up if you’re planning on making multiple, recurring pulls from this data source. Doing so will make future imports more efficient, as Panoply will only collect data that has been updated since the last time you pulled from that source. Note that incremental keys will only work properly if you are pulling from a single table with this connector. See our incremental key documentation to learn more.
- Exclude: you can use this field to exclude specific data elements from your import. If there are specific types of data (e.g. irrelevant or sensitive data) that you would like to exclude from your Panoply data warehouse, you can use this setting to manage that.
When you’re all set, hit Collect. The Data sources - DynamoDB pane will go gray, and a green bar will appear above it to indicate progress. You can check the “Jobs” pane to monitor its progress if you want, but you’ll get an alert from Panoply once the collection has finished or . If you need to get data from another source while you’re waiting, you can just start that up as well. When your data collection is finished, head to Tables on the navigation pane to check up on your data.