By default, Panoply always stores all data consumed by the data sources you defined, regardless of how frequently it’s accessed. This is simple, easy, and safe, allowing you to always have instant access to your data, even if it’s a table or rows several years in the past that you normally don’t have to access.

However, storing all of your data can have two negative implications:

  • Cost. You’re storing all of the data, even if you don’t normally access large parts of it.
  • Performance. Each query needs to run through massive tables only to filter out the rows you’re not interested in.

To overcome this, you can configure your tables to be automatically archived at regular intervals. First, you need to define an archive attribute in the table, which is the column to be used to determine whether a row should be kept in the data warehouse or should be archived. The archive attribute is usually a date column that indicates when the row was first created. Then, you need to define the retention value: how many days in the past you want to keep unarchived.

For example, if you have an events table with records like this:

{type: 'click', created_on: '…'}

You can set it to archive on the created_on field, and retain only the past three months, for example. Everything else will be archived daily.

You can define aggregation transformations that will generate aggregated, unarchived data from the raw data before it’s archived, allowing you to retain access to the information held within the data, without paying the cost and performance penalties for keeping it in your data warehouse.

Getting started is easy! Get all your data in one place in minutes.
Try Free