Panoply is a cloud-based automated data lake and warehouse. Using RedShift technology, Panoply can take you from data collection to querying and analysis in minutes! This guide, combined with the free trial Panoply offers, will allow you to get a feel for how Panoply works.
In this guide we cover:
- Importing Data
- Viewing Tables
- Data Processing and Queries
- Data Analysis
- Alerts and Permissions
- Plan Subscription
- Next Steps
Your Panoply database stores all your data, users, and permissions. Give it any name you want, such as your team, project or company name.
After logging in, you are ready to import your data from your data sources. Always connect to a replication of your production database. Connecting to the production database is against best practices.
Follow these steps to collect data:
On the Panoply console, select Data Sources from the navigation pane on the left side of the screen.
Select your data source from the list. Users familiar with Panoply can add data from one of the many data source categories we offer. For this Getting Started guide we recommend picking Sample Data to load our public Pokémon Stats database. We’ll used this fixed data set to show you around Panoply. If you select Sample Data, the Pokémon data connection details auto-populate for you.
- If you chose something other than Sample Data, enter the relevant credentials. For example, username, password, and URL, IP address, or API details. Or click File Upload to load data, such as this data sample in CSV format, without providing credentials. You can also define the following properties:
- Destination table name, which is the name of the target table where you want to save the data.
- Primary key for the destination table for de-duplication with a pattern.
- List of attributes to exclude from the collection process.
- List of JSON attributes to parse as objects.
- Click Collect to begin the data collection process. A progress bar shows you when the data is being analyzed, modeled, and uploaded into Panoply.
View Your Tables
To view the collected data, click Tables in the navigation pane.
The Table Rows count includes rows deleted today. Our automated process permanently removes deleted rows once per day.
In addition to sorting your tables by name, size, and rows, you can view tables in list or thumbnails view. You can also create folders to categorize tables as suits your needs.
If the attempt to collect data fails, verify that you have provided the correct credentials and entered your destination table name correctly. By default, Panoply retries to connect to the data source three times at 1-hour intervals following a failed attempt.
When you click on a table, you can see the following:
- Data – A sample of the table’s data so that you can get a feel for its structure.
- Metadata – The list of columns and their data types. Metadata is then generated and added to the data you provided by Panoply. Metadata fields start with a double-underscore. For example, __updatetime.
- Connection Details – Lists database connection details. Use these to connect your external SQL tools to your Panoply data.
Clicking a column header from the Metadata view allows you to dig deeper into the table’s inner structure. You can also rename the column, delete it, and change its type or other technical details. Generally, there’s no need to change these values because Panoply automatically assigns them for you.
When you first create your tables, their size includes one-time overhead such as system columns. Tables increase in size more gradually after that. For example, if you create a new table with 1,000 rows of data and then add another 1,000 rows of data, the final table size will be less than double the initial size. For additional detail you can read about Redshift cluster storage space on Amazon’s AWS knowledge center.
Given a source table S (in your SQL server) and a target table T (in Panoply) the number of rows in T may be greater than, less than, or exactly equal to the number of rows in S. Basically, most of the logic for counts is a product of primary keys.
Data Processing and Queries
Now that your data is stored in table format, you can query the sample table.
To start with a basic query on a specific table:
- Click Tables in the navigation pane
- Select the table from the list of open tables
- Click Query.
A simple workbench appears in which you can query your data.
You can also click Analyze in the navigation bar to open the workbench, or connect to your database using any external SQL tool.
Here’s a simple query that lists the top 10 strongest Pokémon in order of their attack. Type or copy this query into the editor:
SELECT name, attack FROM "pokemon" ORDER BY attack DESC LIMIT 10
Enter or click the run button to run the query and display the results.
Here’s how to break them up by type and to count the number of Pokémon of each type:
SELECT type_1, count(1) FROM "pokemon" GROUP BY 1 ORDER BY 2 DESC
Let’s dig deeper and compute the average and median speed for each type:
SELECT type_1, count(1), AVG(speed) as avg_speed, MEDIAN(speed) as median_speed FROM "pokemon" GROUP BY 1 ORDER BY 2 DESC
Great! Now you can play around and start changing the queries to find interesting correlations in the data. If you’re feeling adventurous, try the next query!
SELECT b.type_1, b.color AS main_color, round((b.num_in_main_color*1.0)/c.all_in_type,2) AS percent_of_main_color, c.all_in_type from ( SELECT color, type_1, num AS num_in_main_color, ROW_NUMBER() OVER (PARTITION BY type_1 ORDER BY num DESC) AS rank From ( SELECT color, type_1, count(*) AS num FROM "pokemon" GROUP BY color, type_1 ORDER BY count(*) DESC )a )b LEFT JOIN ( SELECT type_1, count(*) AS all_in_type FROM "pokemon" GROUP BY type_1 )c ON b.type_1=c.type_1 WHERE b.rank=1 AND b.type_1 IS NOT NULL ORDER BY c.all_in_type DESC
After running a query, you can save the query as a transformation or export the resulting table into your own system. Panoply offers two types of transformation views: simple views and materialised views. A transformation typically start as a simple view. When needed, based on performance, simple views can be transformed into materialised views, which are always accessible and up to date.
To view the history of all previously run queries, select Query Log from the navigation pane.
To track your job progress, select Jobs from the navigation pane. You can filter the jobs list according to status (All Statuses, Running, Booting, Pending, Error, and Success) and job type (All Types, Collect, Collect-Direct, Materialize, Alter, and Export).
After the data has been processed and saved according to your requirements, you can analyze the data.
Panoply can connect to many data visualization tools, from Tableau and Looker to iPython notebooks and R. As long as the visualization tool has an ODBC, JDBC, Postgres, or AWS Redshift connection, it can connect immediately to Panoply using the credentials from the Connect page.
Use the following connection details:
- Host – db.panoply.io
- Port – 5439
- Database – The database name you used when you created your Panoply account. If you don’t remember, look at the top-right corner of the platform.
- User and Password – The same credentials you used to sign up to panoply.io. Don’t remember them? You can reset it.
- Driver – Amazon Redshift or Postgres
Your connection details also appear on the Tables page.
Alerts and Permissions
Alerts are triggered in response to any failure during data ingestion from source or custom sources. Panoply also creates alerts with Amazon S3 and Redshift services. Moreover, you can configure custom alerts based on queries created through the workbench.
To invite users to join your data warehouse, select Permissions from the navigation pane, then click Add User.
Nice work! At this point you’ve set up your data warehouse by importing your data, processing your data, querying your data, connecting your BI tool, and setting up alerts.
To continue your plan after the free trial, you’ll need to update your plan subscription.
Here’s how you subscribe to a plan:
Within the Panoply platform, click your profile image in the top right and then click Billing & Plan.
Click Select Plan for the plan that you want.
Fill in the credit card details and click Update My Plan.
Now that your data is set up, here’s how to continue your Panoply journey and get the most out of our cool features:
- Add more data sources to consolidate and combine all your data into one place. Can’t find the data source you need? It’s open-source! You can add your own. Or shoot us an email and we’ll build it for you!
- Schedule your data sources to update periodically.
- Use our SDK data source to stream data in real time.
- Transform your data to best fit your analytical needs.
- Invite your colleagues to collaborate with you – have them add their own data sources and transformations.
- Check out our additional reference docs or post to the Community.