PostgreSQL

PostgreSQL

This document describes the PostgreSQL data source. Continue reading to learn more about:

  • Collecting - what should you know about adding the data source.
  • Data Dictionary - what data is available and how it is structured.

Collecting

Before you start

  • Note the name, the host, and the port of the Postgres database.
  • Note the username and password for the user connecting to the Postgres database.

To configure this data source:

  1. If necessary, whitelist Panoply.
    • Postgres databases with production data are typically not publicly available. To allow Panoply to access your data, see Whitelisting.
  2. Click Data Sources in the navigation menu.
  3. Click the Add Data Source button.
  4. Search for Postgres and select it.
  5. Enter the Host Address to connect to Postgres. If you’re not sure what your connection details are, contact your administrator or open the postgresql.conf file, which is normally kept in the data directory. For more on this file and the relevant connection settings, see Connections and Authentication in the Postgres documentation.

    • The host address has three parts:
      • The URL of the Postgres database or the IP address of the host server.
        • URL example: your.server.com
        • IP example: 123.45.67.89
      • Port - The port number of the Postgres server. This is 5432 for most connections.
      • Database - The name of the Postgres database.
    • Examples:
      • An host address based on an IP address might look like this: 123.45.67.89:5432/dbname where 123.45.67.89 is the IP address, 5432 is the port number, and dbname is the name of the database name. The IP address should be the IP address of the database host computer or DNS server through which you are accessing your database.
      • An host address based on a URL might look like this: postgres.cs7gozdxx7uw.us-west-2.rds.amazonaws.com:5432/dbname, where the URL is the endpoint associated with your database instance, which includes the port number, and dbname is the name of your database.
  6. Enter your Postgres username and password. This user must have permission to access the data. If the permissions are not in place, some of the data will not be available.
    • In Postgres, you can create a Panoply-specific user with read-level and replication permissions to binary logs. Then enter the username and password for this Panoply-specific user. This user must be reserved for Panoply use and unique to your connector. All information entered into Panoply is encrypted to ensure the security of your data. See Data Protection for more information on how Panoply actively provides data security.
  7. Click Next.
  8. Select the Postgres tables from which to collect data. Panoply stores each Postgres table as a unique table.
  9. (Optional) Set the Advanced Options.
    • We recommend not changing advanced options unless you are an experienced Panoply user.
  10. Click Collect.
    • The data source appears grayed out while the collection runs.
    • You may add additional data sources while this collection runs.
    • You can monitor this collection from the Jobs page or the Data Sources page.
    • After a successful collection, navigate to the Tables page to review the data results.

Data Dictionary

Because Postgres data comes from a database system, Panoply cannot provide a data dictionary. But Panoply does automate the data schema for the collected data. This section includes useful information about the Panoply automations. You can adjust these settings in your data source under Advanced Options.

  • Destination - Panoply selects a default destination. These are the tables where data is stored. The default name of each destination table in Panoply is the name of the table in your Postgres database. You can add a prefix or postfix to destination table names. For example, to prefix all table names with salesdata_ you would use this syntax: salesdata_{__table_name}. In this case, {__table_name} is a variable that represents the source table’s name.

  • Primary Key - Field(s) to use as the deduplication key when collecting data. Panoply sets the primary key depending on the scenario identified in the following table. To learn more about primary keys in general, see Primary Keys.

Postgres id column Enter a primary key Outcome
yes no Panoply will automatically select the id column and use it as the primary key.
yes yes Not recommended. Panoply will use the id column but will overwrite the original source values.
If you want Panoply to use your database table’s id column, do not enter a value into the Primary Key field.
no no Panoply creates an id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
no yes Panoply creates a hashed id column using the primary key values entered, while retaining the source columns. WARNING: Any user-entered primary key will be used across all the Postgres tables selected.
  • Incremental Key - By default, Panoply fetches all of your Postgres data on each run. If you only want to collect some of your data, enter a column name to use as your incremental key. The column must be logically incremental. Panoply will keep track of the maximum value reached during the previous run and will start there on the next run.
    • WARNING: If you set an incremental key, you should only select one table. Otherwise, the collection will use a single set of incremental key & value for all the tables.
  • Exclude - When collecting data, you may want to exclude certain data, such as names, addresses, or other personally identifiable information. The specified data will be excluded across all the Postgres tables selected.

  • Parse String - If the data that will be ingested into Panoply contains JSON, include the JSON text attributes to be parsed upon ingestion in the “Parse string” field. For more information, see Advanced Options.

  • A column in a table uses the same data type for all values in that column. Panoply automatically chooses the data type for each column based on the available values. This is important to note for this data source. If even one value in a column has text, then the entire column is considered data type Text.

  • The following metadata columns are added to the destination table(s):
    • id - If you do not select a primary key, and no id column exists in the source table, Panoply will insert an id. Formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
    • __schemaname - The schema that the source table is located in.
    • __senttime - Formatted as a datetime, such as 2020-04-26T01:26:14.695Z.
    • __state - Reserved for internal Panoply use.
    • __tablename - The name of the source table in Postgres.
    • __updatetime - Formatted as a datetime, such as 2020-04-26T01:26:14.695Z.
Getting started is easy! Get all your data in one place in minutes.
Try Free