Primary Keys
  • 07 Jun 2021
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Primary Keys

  • Dark
    Light
  • PDF

Article Summary

Primary keys are used to ensure the uniqueness of the rows in a table. The Primary Key is comprised of one or more columns that can be used to identify the row in the data warehouse. The process guarantees that no two rows have the same value in their primary key.

When new data is inserted into a table, it is first checked against the existing data to determine if a row with the same value in the primary key already exists. If it does, the new row is used to update the existing row, rather than inserting a new one. However, unlike many data warehouses, Panoply doesn't do that by rejecting attempts to insert the new conflicting row with a unique error. Instead, it uses the new row to update some, or all, of the attributes (columns) in the existing row.

Default Primary Keys

Default primary keys are different based on whether your data warehouse is built on Amazon Redshift or Google BigQuery.

Amazon Redshift

All of the API data sources have a default primary key for each and every table created from them. All the data sources use a field named id as the default primary key with some exceptions (MongoDB, for example, uses the _id attribute). If no primary key is given at set up, and there is no default one in the data source (id field, for example), then a random value is generated instead. To successfully set the primary key, use dynamic fields by placing each attribute name in curly brackets, for example: {first_name}-{last_name} will use the first_name and last_name attributes as the primary key of each row.

Redshift Primary Key Scenarios

Source id columnEnter a primary keyOutcome
yesnoPanoply will automatically select the id column and use it as the primary key.
yesyesNot recommended. Panoply will use the id column but will overwrite the original source values.
If you want Panoply to use your source's id column, do not enter a value into the Primary Key field.
nonoPanoply creates an id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
noyesPanoply creates a hashed idcolumn using the primary key values entered, while retaining the source columns.

Google BigQuery

For Panoply on BigQuery, the default primary key will be __panoply_id. Users may specify a different primary key in the Advanced Settings when configuring the data source, and Panoply will use that to create the __panoply_id field. If there are any sub-tables, this becomes the foreign key in that sub-table in the format of __{parent_table}_panoply_id.

Primary Key Scenarios

Source id columnEnter a primary keyOutcome
yesnoPanoply will automatically select the id field and use it to populate the __panoply_id column.
yesyesPanoply will automatically select the id field and use it to populate the __panoply_id.
nonoPanoply creates the __panoply_id column formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
noyesPanoply creates the __panoply_idcolumn using the primary key values entered, while retaining the source columns.

Any user-entered primary key will be used across all the destination tables.


Was this article helpful?

What's Next