Data Ingestion Engine

Data Ingestion Engine

As your data travels from a data source into your Panoply database it passes through Panploy’s Data Ingestion Engine. This article explains the Data Ingestion Engine’s constraints, standards it adheres to, and conversions it performs.

For example, you may have three data sources that each format dates differently. As data passes from those sources into your Panoply database, the Data Ingestion Engine standardizes the disparate formats into one consistent date format.

Data Ingestion Engine Specifications

The following sections explain how the Data Ingestion Engine handles destinations, dates, timestamps, and numbers.

Destinations

A destination is a string of characters used to define the table(s) in your Panoply database where your data will be stored. When you set up a data source, you can supply a destination or leave this field blank and use the default destination. A destination can include a combination of literals and symbols, as defined below. The Data Ingestion Engine converts all alphabetic characters to lowercase.

Literal

A literal is a raw string of characters.

Allowed characters:

  • A-Z (will be converted to lowercase)
  • a-z
  • 0-9
  • _ (underscore)
  • - (dash)
  •  (space)

Symbol

A symbol is the name of an object key. The Data Ingestion Engine substitutes each symbol with its related value.

Allowed characters: all

Example

A destination of {category}-{sub} includes one literal (-) between two symbols ({category} and {sub}).

For an object {"category": "toys", "sub": "puzzles"}, a destination of {category}-{sub} resolves to toys-puzzles.

Dates

Dates are converted to strings and saved in the format: YYYY-MM-DDThh:mm:ss.sssZ. This is compliant with ISO-8601.

Panoply supports these date formats:

Date format Example
ANSI C Mon Jan _2 15:04:05 2006
Unix Date Mon Jan _2 15:04:05 MST 2006
Ruby Date Mon Jan 02 15:04:05 -0700 2006
RFC 1123 Mon, 02 Jan 2006 15:04:05 -0700
RFC 3339 (ISO 8601 profile) 2013-03-31T10:05:04.9385623+03:00
year/month/day 2013-03-28 10:05:00 +0000 UTC
2-digit year 08/21/71
Date without day 2014-04

Timestamps

Panoply supports both string and integer timestamps. Timestamp length must be between 8 and 14. Longer or shorter timestamps are not considered applicable.

Timestamp resolution is in seconds. The Data Ingestion Engine resolves 1432399705 and 1432399705000 to the same UTC date of 2015-05-23T16:48:25Z.

Numbers

Panoply uses double-precision floating-point format for numbers. This means the largest number Panoply can parse is 9,007,199,254,740,991.

Transformations

Although it is not possible to add transformations during ingestion, Panoply solves the same problem another way. Panoply supports the creation of materialized views right after an ingestion.

Materialized views cache the results of a query as a table rather than a non-cached, virtual table. A materialized view is especially useful for frequently accessed data. Materialized views increase query performance because queries go to the materialized views rather than to the underlying detail tables.