This guide is for new Panoply users looking to set up their first data warehouse. In this guide we cover:
After logging in, you are ready to import your data from your data sources. Always connect to a replication of your production database. Connecting to the production database is against best practices.
We invite you to try our service using this data sample. It is in CSV format, and you can follow the steps below to upload it to your Panoply account.
Follow these steps to collect data:
- On the Panoply console, select Data Sources from the navigation pane on the left side of the screen.
- Select your data sources from the list, and enter the relevant credentials. For example, URL/IP address/API details, username, and password. You can also define the following properties:
- Destination table name, which is the name of the target table where you want to save the data.
- Primary key for the destination table for de-duplication with a pattern.
- List of attributes to exclude from the collection process.
- List of JSON attributes to parse as objects.
- Click Collect to begin the data collection process. After the credentials are authenticated, your data is uploaded and modeled.
- To view the collected data, click Tables in the navigation pane.
In addition to sorting your tables by name, size, and rows, you can view tables in list or thumbnails view. You can also create folders to categorize tables as suits your needs.
If the attempt to collect data fails, verify that you have provided the correct credentials and entered your destination table name correctly. By default, Panoply retries to connect to the data source three times at 1-hour intervals following a failed attempt.
When you first create your tables, their size includes one-time overhead such as system columns. Tables increase in size more gradually after that. For example, if you create a new table with 1,000 rows of data and then add another 1,000 rows of data, the final table size will be less than double the initial size. For additional detail you can read about Redshift cluster storage space on Amazon’s AWS knowledge center.
Given a source table S (in your SQL server) and a target table T (in Panoply) the number of rows in T may be greater than, less than, or exactly equal to the number of rows in S. Basically, most of the logic for counts is a product of primary keys.
Data Processing and Queries
Once the data has been stored in table format, you can query the sample table. After running a query, you can save the query as a transformation or export the resulting table into your own system.
Panoply offers two types of transformation views: simple views and materialised views. A transformation typically start as a simple view. When needed, based on performance, simple views can be transformed into materialised views, which are always accessible and up to date.
- Select Data Sources from the navigation pane, then select the table from the list of open tables.
- Run a query against the selected table.
- To view the history of all previously run queries, select Queries from the navigation pane.
- Select Jobs from the navigation pane to track your job progress. You can filter the jobs list according to status (All Statuses, Running, Pending, Error, and Success) and job types (such as “collect”, “collect-direct”, “table-archiving”, “account-query-archiving”, and “align-table”).
Once the data has been processed and saved according to your requirements, you can analyze the data.
- Select Analyze from the navigation pane, then select your favorite tools to analyze the data. Panoply supports market-leading BI tools such as Tableau, Looker, and re:dash.
- Select your favorite BI tool and integrate your data with that tool to view your data analytics.
Panoply can connect to many data visualization tools, from Tableau and Looker to iPython notebooks and R. As long as the visualization tool has an ODBC, JDBC, Postgres, or AWS Redshift connection, it can connect immediately to Panoply using the credentials from the Analyze page as shown below:
Alerts and Permissions
Alerts are triggered in response to any failure during data ingestion from source or custom sources. Panoply also creates alerts with Amazon S3 and Redshift services. Moreover, you can configure custom alerts based on queries created through the workbench.
To invite users to join your data warehouse, select Permissions from the navigation pane, then click Add User.
Nice work! At this point you’ve set up your data warehouse by importing your data, processing your data, querying your data, connecting your BI tool, and setting up alerts.
To learn more, check out our additional reference docs or post to the Community.