PostgreSQL

PostgreSQL

This document provides instructions for integrating PostgreSQL data into Panoply. The following items will be covered:

Prerequisites

Before beginning data integration, ensure you have the following information ready:

  • Host address or URL of the database
  • Username assigned to the database
  • Password for the database

All information entered into Panoply is encrypted to ensure the security of your data. The information is only used to establish a link between Panoply and your external database to obtain the data you wish to ingest into Panoply. See Data Protection for more information on how Panoply actively provides data security.

Getting Started: IP Whitelisting and AWS Security Groups

To ensure data security and privacy, production databases are typically not publicly available. Non-public database connections require additional measures to facilitate data integration depending on your firewall/network security and configuration and the location of your database (such as in a cloud via Amazon Web Services (AWS) or on-premise database).

To allow Panoply to access your data through your network or firewall, see IP Whitelisting.

To allow Panoply to connect to your data through your AWS Relational Database Services (RDS) PostgreSQL database instance, see AWS Security Groups.

IP Whitelisting

To allow Panoply access to the data you specify during data integration, Panoply recommends adding or IP whitelisting the following IP addresses to your firewall/network configurations:

  • 52.86.255.240
  • 52.86.255.248
  • 52.86.255.241
  • 52.86.255.249
  • 52.86.255.242
  • 52.86.255.250
  • 52.86.255.243
  • 52.86.255.251
  • 52.86.255.244
  • 52.86.255.252
  • 52.86.255.245
  • 52.86.255.253
  • 52.86.255.246
  • 52.86.255.254
  • 52.86.255.247
  • 52.86.255.255
  • Or in CIDR notation, use 52.86.255.240/28

AWS Security Groups

To allow Panoply to access your PostgreSQL instance in AWS, create a Security Group with an Inbound rule specifying the CIDR for Panoply as a Custom range.

Basic PostgreSQL Data Integration

To integrate PostgreSQL data into Panoply using default selections, complete the following steps. For more advanced options, complete the following and refer to the subsequent Advanced PostgreSQL Data Integration section for detailed information.

  1. From the Home page in the Panoply dashboard, click Data Sources. Then, click the Add Data Source button.
  2. In the Data Sources – Choose Source Type dialog, select Postgres, which is listed under both Most Popular and Databases.
  3. In the Data Sources – Postgres dialog, enter the Host Address, Username, and Password of your PostgreSQL database server.

    Note that the Host Address contains three parts:

    1. The URL of the database instance or IP address of the host computer/server.

    2. The port number, which by default is 5432.

    3. The name of the database.

  4. After you have populated the Host Address, Username, and Password fields, click Next to connect the database to Panoply.

    If there is a problem with the connection, you will receive an error message. See Resolving Error Messages section for more information.

    If the connection is successful, a Tables field will appear. Select the tables you wish to upload into Panoply, or simply select the All checkbox.
  5. Click Show next to the Advanced options section. Then complete the Advanced options relevant to your needs for the data source, and click Collect.

See the Advanced PostgreSQL Data Integration section for more information.

After clicking Collect, the screen will gray out while the data is being ingested into Panoply. While you are waiting for the data to ingest, you can add a new data source or review previously ingested data.

When the data is ready, you will see the following message pop up in the bottom left corner of the screen.

Click View to view the data now.

Advanced PostgreSQL Data Integration

After completing the above steps, click Show next to the Advanced heading. The window will expand with additional fields you can populate to further customize the data ingestion into Panoply.

See the sections below for additional information about these fields.

After you have populated the desired fields, click Collect.

  Description Example
Destination Name of the table target where you wish to save the data. my_table_name
Primary Key Define a primary key for the destination table for de-duplication with a pattern. {hello}_{world}
Exclude List of attributes to exclude from the collection process. See below for more information. Credit card numbers or personally identifiable information
Parse string List of JSON text attributes to parse as objects. See below for more information.  
Truncate table Delete the data from the table before collecting the source.  

Exclude

When ingesting data into Panoply, you may want to exclude certain attributes from the collection process, such as names, addresses, or other personally identifiable information. Excluding attributes that are not necessary to the later data analysis can also speed up other processes in the data analysis pipeline.

To exclude data, type the attribute to be excluded in the Exclude field. When you begin typing, another line to type additional attributes appears automatically.

To exclude a nested attribute, use dot notation.

Parse string

If the data that will be ingested into Panoply contains JSON, include the JSON text attributes to be parsed upon ingestion in the “Parse string” field. As you type the JSON text attributes in the “Parse string” field, another line to type additional text attributes appears automatically.

Resolving Error Messages

An error message may occur when attempting to connect your database or data warehouse to Panoply. The type of error message you receive will indicate the reason for the connection error.

Need More Than 1 Value to Unpack

If you receive this error message, verify that the Host Address contains all three parts: the IP address or URL of the database server, the port number, and the database name.

For example: 192.168.015:5432/dbname where 192.168.015 is the IP address, 5432 is the port number, and dbname is the name of the database name. The IP address should be the IP address of the database host computer or DNS server through which you are accessing your database.

Or, postgres.cs7gozdxx7uw.us-west-2.rds.amazonaws.com:5432/dbname, where the URL is the Endpoint associated with your database instance, which includes the port number, and dbname is the name of your database.

After you have completed the Hostname field correctly, enter the Username and Password for your database, click Next, and continue the integration process.

Login Failed for user: <username>

If you receive this error message, verify that you have typed the correct Username and Password for the database server to which you are trying to connect.

After you have completed the fields correctly, click Next, and continue the integration process.

504 Gateway Time-out

If you receive this error message, it is likely that you have not enabled Panoply to access your database.

If your database is on-premise or through a host computer, ensure you have whitelisted Panoply IP addresses. See IP Whitelisting for more information, including a list of Panoply’s static IP addresses.

If you are using an AWS database instance, ensure your Security Groups are modified to include an Inbound Rule for Panoply’s static IP addresses. See the following example.

After you have correctly set your security settings to allow a connection to Panoply, continue the integration process.