FAQs

FAQs

Can a data architect help me with archiving data or performance enhancement? Can I schedule data sources? Can Panoply ingest social media streams? Data Processing and Queries Do you support CREATE TABLE commands? Does Panoply have a recommended file size for uploading? Does Panoply support real-time data ingestion and queries? For adding data sources via SFTP, can I only upload one file at a time? How can I run queries on top of Panoply? How can our company benefit from Panoply? How do I connect Looker to Panoply? How do I connect my visualization tool to Panoply? How do I create transformations? How do I delete a database? How do I flatten my data? How do I upload a CSV file? How is table size calculated? How many seats are included per pricing tier? How much does Panoply cost? How much would it cost for Panoply to build me an integration to a data source? How can I delete (drop) tables? How can I export a table? I am already using an ETL service can I still connect to Panoply? I have some questions about specific features, who should I contact? If we delete a data source, is there an easy way of deleting all the tables that were generated by it? I'm interested in enabling history tables for some of my data. I see it mentioned in your docs but there's no documentation on how to configure? I'm using Stitch and it created multiple schemas. How can I access this data? Is Google Analytics subject to sampling and does this impact Panoply? Is it possible to reference a RESTful URL to schedule as a data source? Is there any limit to the number of rows allowed for views created by transformations? My data source run failed. What should I do? Why doesn't my Google Analytics data match what I'm seeing in Panoply? My view disappeared after I dropped a table. Where did it go? Panoply leans more towards structured / semi-structured. If we plan to ingest social media streams, how do we go about that? What are the advantages/disadvantages to JDBC vs. ODBC? What kind of files can I upload? When I first log in, what kind of performance should I be expecting? Where can I see my total storage usage?

Can a data architect help me with archiving data or performance enhancement?

Yes. Panoply self-optimizes based on your needs. It does this by learning your business logic. Integrating new datasets allows for augmentation of existing logic or the introduction of new logic. Panoply will incorporate these new additions and add them into your optimized routines. Contacting your data architect can help expedite the optimizations inside the Panoply platform.

Can I schedule data sources?

Yes, scheduling is available for any data source except file uploads and SDK-based data. Data transfers can happen at frequency ranging from hourly to daily. To schedule a data source for upload: Data Scheduling Modal

  1. Log into Panoply.
  2. Click Cloud Icon Data Sources.
  3. Click Add Data Source in the upper right-hand corner.
  4. Select your data source.
  5. Click the Calendar icon.
    The Schedule data source modal will appear. From here you can set your collection interval.
  6. After setting your collection interval click Accept.

Can Panoply ingest social media streams?

Panoply is an analytical data warehouse. We store everything in a structured way. We store social media data, such as blobs of text from Facebook statuses or tweets, but the data is not optimized. It is not analytical to query this type of text.

Data Processing and Queries

Once the data has been stored in table format, you can query the sample table. After running a query, you can save the query as a transformation or export the resulting table into your own system. Panoply offers two types of transformation views: simple views and materialized views. A transformation typically start as a simple view. When needed, based on performance, simple views can be transformed into materialized views, which are always accessible and up to date.

  1. Select Data Sources from the navigation pane, then select the table from the list of open tables.
  2. Run a query against the selected table.
  3. To view the history of all previously run queries, select Queries from the navigation pane.
  4. Select Jobs from the navigation pane to track your job progress. You can filter the jobs list according to status (All, Running, Pending, Error, Success) and job types (such as “collect”, “collect-direct”, “table-archiving”, “account-query-archiving” and “align-table”).

Do you support CREATE TABLE commands?

Yes, we do support CREATE TABLE commands. The following script, with the proper data substituted, will create one table:

CREATE TABLE IF NOT EXISTS tablename(column1 type1, column2 type2, column3 type3);

File Upload modal

Panoply supports files up to 100 MB in size. If your file is larger than this, you can import the data using a service listed under Data Sources > Files & Services.

In addition to the HTTPS-based File Upload, we currently support the following services:

  • Amazon S3
  • Amazon SQS
  • Google Cloud Storage
  • WebHDFS
  • RabbitMQ
  • SFTP

You can also submit a request for a new data source.

Does Panoply support real-time data ingestion and queries?

Yes. You can use Panoply’s SDKs to send data in real time. Every query that is executed against your data warehouse is calculated based on the most recent data available.

For adding data sources via SFTP, can I only upload one file at a time?

Correct. SFTP only works with a single file at a time. Connecting to an existing database using JDBC or ODBC will allow Panoply to grab all the data in the database without multiple file uploads.

How can I run queries on top of Panoply?

Any visualization tool that supports ODBC, JDBC, Postgres, or AWS Redshift can connect to Panoply. The credentials needed to connect to Panoply are available by clicking Connect Icon Connect from the side nav. This will display a window similar to the following image.

How can our company benefit from Panoply?

Panoply is an automated data warehouse. It self-optimizes based on usage. Your company can benefit from Panoply in a few ways:

  • Your data team can focus on the data itself instead of data architecture, maintenance, optimization, etc. This frees up a lot of their time to turn this data into usable information.
  • Because of the way Panoply stores data, it is optimized based on your specific business logic. The platform then uses the most optimized architecture possible based on the queries you run on top of the data.
  • The platform gives you the ability to easily add new data from multiple and diverse data sources without the need to go through the R&D team. By doing this it allows you to speed up the process of changing things in your data warehouse.

How do I connect Looker to Panoply?

Any visualization tool that supports ODBC, JDBC, Postgres, or AWS Redshift can connect to Panoply. The credentials needed to connect to Panoply are available by clicking  Connect from the side nav. This will display a window similar to the following image.

How do I connect my visualization tool to Panoply?

Any visualization tool that supports ODBC, JDBC, Postgres, or AWS Redshift can connect to Panoply. The credentials needed to connect to Panoply are available by clicking Connect from the side nav. This will display a window similar to the following image.

How do I create transformations?

Using the Workbench tool, from the Analyze menu, allows you to write an SQL query and – once you see the data you expect – save the query. By saving the transformation, it will be available for future use in any visualization tool you connect with.

Workbench example

To save your transformation:

  1. Click on the three dots in the upper-right corner of the Workbench
  2. Click Save As…
  3. Enter a filename, and click Accept

How do I delete a database?

Contact Panoply Support at support@panoply.io or through the chat box to the bottom right.

How do I flatten my data?

Table Settings Example Image

The configuration of fields to flatten occurs on the tables themselves and not in the data source. After the initial collection, you can go to the Tables screen and click on the relevant table. This will list its columns and allow you to make any desired changes, including flattening.

Example

Assume you have a users collection with a field named name that holds your nested object: {first: 'xxx', last:'yyy'}. By default, the platform creates two tables (users and users_name), one for the main (top level) data and another for the nested data under the field name. To flatten this nested data into the main (parent) table, you would take the following steps:

  1. Go to the main table on the Tables screen.
  2. Click the name field and choose Flatten.
  3. Once you collect the data again the nested data flattens into the main table.

Notes

Be aware of the following:

  • The changes won’t affect existing data. Only future collections of data will use the new configuration.
  • If the field you are looking for does not exist in the main table, you must create one. The table name and column will depend on your specific data model. Our example above would use the following statement:
ALTER TABLE "users" add column "name" varchar(256);

How do I upload a CSV file?

Panoply imports data using the Structurize pre-processor. Structurize accepts the following file formats:

  • csv
  • tar
  • gzip
  • json
  • xlsx
  • querystring

To upload a file to Panoply:

  1. Log into Panoply
  2. Click Cloud Icon Data Sources
  3. Click Add Data Source in the upper right-hand corner.
  4. Click File Upload
  5. Click Choose File and select the file you want to upload for processing.
  6. Enter a Destination
  7. Click Upload

Your file will then upload to Panoply and begin processing.

How is table size calculated?

Disk space is allocated per column because Redshift is a columnar data store. Within a table, each column is at least 1 MB. Each table also includes 3 system columns. Column sizes are multiplied by a replication factor, which is usually between 3 and 8.

The size increase is much more gradual after initial creation. You can read about Redshift cluster storage space on Amazon’s AWS knowledge center.

How many seats are included per pricing tier?

Pricing is a tier-based subscription that starts at $250 a month, and includes unlimited seats. Support is also provided at no extra cost.

How much does Panoply cost?

Panoply’s pricing is a tier-based subscription that scales to meet the size and needs of your business. Cost centers on the amount of data stored, and the amount of processing volume your company uses. Our individual tiers provide scalable pricing for businesses from small to enterprise levels. Please contact us at pricing_request@panoply.io to discuss your company’s needs.

How much would it cost for Panoply to build me an integration to a data source?

Panoply’s integration dashboard is an open-source framework. You can build data connectors yourself for no cost. If you need a specific data source built, Panoply will build one at no extra cost with a signed order form. If you need a custom data source to accurately evaluate Panoply, it can be custom built for a one-time fixed fee per data source. For pricing of custom data sources, please contact Panoply via the chat box to the lower-right or set up a demo.

How can I delete (drop) tables?

Tables can be dropped in several methods. The most basic method — shown below — can be used from the Analyze page (or any connected SQL client) using the DROP TABLE command. For example:

DROP TABLE foo;

More nuanced table deletion can be performed by addressing the schema in which your data is located. As an example, the following script will select and drop all tables in the public schema:

DROP TABLE 'public.%';

Be aware that this exact command is a global drop for all tables in the public schema as the % acts as a wildcard. Individual tables can be dropped by replacing the % with the table name, or a partial table name with an attached % (i.e., foo%). If your data lives in schemas beyond public you will need to replace public in the DROP TABLE example above with the name of the schema in which your table resides. When there are dependencies on the table you wish to drop you will need to first drop the dependencies as well.

How can I export a table?

Exporting is currently accomplished through two methods:

  1. Connect an external workbench and use that to export your data. This is available through the Connect Icon Connect menu.
  2. Use UNLOAD queries to export the query result to the desired S3 bucket. For example:
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization

More information is available at: http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

I am already using an ETL service can I still connect to Panoply?

Yes. You can connect ETL service to Panoply using the information provided by Connect in the nav menu. This will allow you to use Panoply’s Analyze page as the endpoint for your ETL process. The information should look similar to the following image.

I have some questions about specific features, who should I contact?

Each account has a dedicated data architect. If you do not know who you data architect is, it might be because you are still in trial or have not set up a demo. If you do not know the contact information for your architect, contact Panoply through the chat box to the bottom right, send an email to support@panoply.io, or request a demo.

If we delete a data source, is there an easy way of deleting all the tables that were generated by it?

The best way to delete tables is with the DROP TABLE syntax in the Workbench. You can access the list of tables in your database using the following command:

SELECT * FROM pg_tables;

From here you can decide what tables to drop from your database. Then use the DROP TABLE syntax to delete it.

DROP TABLE foo;

I'm interested in enabling history tables for some of my data. I see it mentioned in your docs but there's no documentation on how to configure?

The history feature is something that we enable on the backend. Please contact your data architect, or contact Panoply via the chat box to the lower-right.

I'm using Stitch and it created multiple schemas. How can I access this data?

When using a data warehouse with multiple schemas, specify the name of the schema before the table name in your queries. For example, if you have a schema named “panoply” with a table named “io” you can query the table with the following query through the Analyze Workbench:

SELECT * FROM panoply.io

Is Google Analytics subject to sampling and does this impact Panoply?

Yes, all Google Analytics (GA) data is subject to sampling unless you use GA premium. Panoply pulls data as-is from your GA account. This means that Panoply reflects what you see in GA.

Is it possible to reference a RESTful URL to schedule as a data source?

This is not currently supported in the Data Source Icon Data Source menu. However, you can push a json response to Panoply using one of our SDKs:

Is there any limit to the number of rows allowed for views created by transformations?

No limit. However in the analyze page we limit the results to 100 rows so the UI won’t be flooded.

My data source run failed. What should I do?

If the error that caused the data source to fail is a retriable error, then Panoply will retry to run it again within an hour (up to 3 retries). You can always check the Jobs screen to see the status of current and past data sources runs. If one of them failed it will be marked as error.

Why doesn't my Google Analytics data match what I'm seeing in Panoply?

One potential cause for the mismatch is a discrepancy in the dimensions of the analyzed data. It’s also possible that the data pulled from GA may not be up-to-date with the data displayed in GA, creating a false reading. To remedy this situation, verify that you’re analyzing the same dimensions. If this is the case, try rerunning the import. Failing this, you can try deleting the tables and completely re-importing the data.

My view disappeared after I dropped a table. Where did it go?

If you just dropped a table using CASCADE, it also dropped all dependent objects on that table. This will cause views associated with that CASCADE to disappear.

Panoply leans more towards structured / semi-structured. If we plan to ingest social media streams, how do we go about that?

Panoply is an analytical data warehouse. We store everything in a structured way. We do store social media data, such as blobs of text from Facebook statuses or tweets, but the data is not optimized. It is not analytical to query this type of text.

What are the advantages/disadvantages to JDBC vs. ODBC?

  • JDBC offers a live connection to the database, and is the preferred method.
  • ODBC connections don’t use the full power of the database.

Panoply actually runs best using an AWS Redshift connection. However, it does also support Postgres, JDBC, and ODBC connections.

What kind of files can I upload?

Panoply imports data using the Structurize pre-processor. Structurize accepts the following file formats:

  • tar
  • gzip
  • json
  • csv
  • tsv
  • xlsx
  • querystring
  • WebDistribuitonLog

To upload a file to Panoply:

  1. Login to Panoply
  2. Click Cloud Icon Data Sources
  3. Click Add Data Source in the upper right-hand corner.
  4. Click File Upload.
  5. Click Choose File and select the file you want to upload for processing.
  6. Enter a Destination.
  7. Click Upload.

Your file will then upload to Panoply and begin processing.

When I first log in, what kind of performance should I be expecting?

Panoply is a self-optimizing analytics infrastructure, when data is first loaded into the platform, it has no knowledge of your specific needs. Due to this, it provides the same performance as an unoptimized AWS Redshift cluster. As you begin analyzing your data, the platform learns your business logic and will rebuild your schema in an agile way. Over time your database performance will increase. The initial optimizations occur within three days of your first analysis. Within two weeks, with regular use, your schema’s optimization should be complete. If the performance isn’t as significant as expected, please contact your account’s data architect. Or, use the chat box in the lower right to request a demo and have a data architect appointed.

Where can I see my total storage usage?

You can use the following query to determine how much total storage your data occupies:

SELECT sum(size) FROM svv_table_info;

The returned value is the amount of data before vacuuming, and is returned in MB. After vacuuming, the size may decrease. Vacuuming is a process that happens automatically by Panoply.