Primary Keys

Primary Keys

Primary keys are used to ensure uniqueness of the rows in the table. Each of your tables has one or more columns that are used together to define what represents a single unique row in the data warehouse. The process guarantees that no two rows have the same value in their primary key.

When new data is inserted into a table, it is first checked against the existing data to determine if a row with the same value in the primary key already exists. If it does, the new row is used to update the existing row, rather than inserting a new one. However, unlike many data warehouses, Panoply doesn’t do that by rejecting attempts to insert the new conflicting row with a unique error. Instead, it uses the new row to update some, or all, of the attributes (columns) in the existing row.

Default Primary Keys

Default primary keys are different based on whether your data warehouse is built on Amazon Redshift or Google BigQuery.

Amazon Redshift

All of the API data sources have a default primary key for each and every table created from them. All the data sources use a field named id as the default primary key with some exceptions (MongoDB, for example, uses the _id attribute). If no primary key is given at set up, and there is no default one in the data source (id field, for example), then a random value is generated instead. To successfully set the primary key, place each attribute name in curly brackets, for example: {first_name}-{last_name} will use the first_name and last_name attributes as the primary key of each row.

Redshift Primary Key Scenarios

Source id column Enter a primary key Outcome
yes no Panoply will automatically select the id column and use it as the primary key.
yes yes Not recommended. Panoply will use the id column but will overwrite the original source values.
If you want Panoply to use your source’s id column, do not enter a value into the Primary Key field.
no no Panoply creates an id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
no yes Panoply creates a hashed id column using the primary key values entered, while retaining the source columns.

Google BigQuery

For Panoply on BigQuery, the default primary key will be __panoply_id. Users may specify a different primary key in the Advanced Settings when configuring the data source, and Panoply will use that to create the __panoply_id field. If there are any sub-tables, this becomes the foreign key in that sub-table in the format of __{parent_table}_panoply_id.

Primary Key Scenarios

Source id column Enter a primary key Outcome
yes no Panoply will automatically select the id field and use it to populate the __panoply_id column.
yes yes Panoply will automatically select the id field and use it to populate the __panoply_id.
no no Panoply creates the __panoply_id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
no yes Panoply creates the __panoply_idcolumn using the primary key values entered, while retaining the source columns.

WARNING: Any user-entered primary key will be used across all the destination tables.

Next Steps

Getting started is easy! Get all your data in one place in minutes.
Try Free