SQL and BI Tools

SQL and BI Tools

Connect your Tools to Panoply

Once you have data collected into one or more tables, you can work with it. Panoply provides a workbench that you can use to build SQL queries, but you can also connect to your favorite BI and ETL tools.

The steps for connecting an SQL workbench or a BI tool are often basically the same. First, go to platform.panoply.io/#/connect and copy your connection details. 

Connection details:

Connection detail Description
Host db.panoply.io
Port 5439
Database The database name you used when you created your Panoply account. If you don’t remember, look at the top-right corner of the platform.
User / Password The same credentials you used to sign up to panoply.io. Don’t remember them? You can reset it.
Driver BigQuery or Amazon Redshift or Postgres

As long as the tool you want to connect to Panoply uses ODBC/JDBC or has a built-in connector for BigQuery,  Postgres or AWS Redshift, you should be able to use the same connection details for everything. If you need extra help setting up your connections, check out our in-depth guides to connecting BI tools to your Panoply instance.

Using an SQL client

Panoply is designed to allow you to access and manage your data using SQL, so in addition to your analytics and visualization toolset, you’ll be using SQL to prepare and model your data for analytics and reporting.

Panoply provides a simple web-based SQL workbench that you can use to explore your data, but for more extensive and complex query building, you’ll probably want to set up an SQL client. Fortunately, there are some excellent options out there. If you need or want to try a new one, here are a few great free recommendations:

Recommended SQL Clients for Mac (with free versions)

  • Postico - Fast and simple Mac desktop client

  • Metabase - Fast, web-based client, easy visualization workflow

  • Mode - Cloud-based data science workflow with built-in SQL client

  • PopSQL / Seekwell - A newer option. Slower performance but has stronger visualization capabilities

Recommended SQL Clients for Windows (with free versions)

  • Metabase - Fast, web-based client, easy visualization workflow

  • Mode - Cloud-based data science workflow with built-in SQL client

  • DBeaver - Free Microsoft standard

Profiling with SQL

Now that we’ve covered how tables are generated and what to expect from them, let’s take a look at some ways you can use SQL to start exploring your data in Panoply. Open up your SQL client or head to the Panoply workbench for this step. After you’re set up, you can start with a simple query that will show you all the tables relevant to a specific data source, like this:

SELECT DISTINCT tablename as 'name'
FROM pg_tables
WHERE tablename LIKE '{data source}_%%'

Note: Remember to replace {data source} with the data source prefix specified when collecting your data. If, for example, you used default settings to collect data from Twitter, your prefix would be twitter_.

This query will return a single column that lists every table that was generated from your data source. Once you have this list, the fastest way to figure out how the data you need is spread across different tables is just to start digging in and exploring individual tables. You can get a quick look at what might be in an individual table by checking the column names with a query like this:

SELECT table_name, column_name
FROM information_schema.columns
WHERE table_name = '{table of interest name}'

Replace {table of interest name} with the actual name of the table you’re interested in exploring. If you spot the columns you’re looking for in a particular table, you can then run a quick check on the data itself with a query like this:

FROM {table of interest name}

As usual, replace {table of interest name} with the name of the table you’re taking a closer look at. This will give you the first 100 records in the selected table, which will allow you to take a closer look at the data itself.

A simple join

We’re talking relational table structures here, so it’s helpful to touch on joins at this point. If you’re not familiar with joins, we’ll cover enough of the basics here for you to get something done. Be sure to check out more in-depth documentation when you have a chance, though.

It often happens that the data you need to answer a specific question is spread across multiple tables in a relational database. This is by design, and helps keep your database tidy and reliable. In practice, this means that if you have a customers table, a sales table and a products table and want to see total sales broken out by product and customer type, you will need to join multiple tables in your query.

Simple join query

In order to join two tables, you will need to identify a key that links the two tables. Most databases are designed to provide designated, unique keys for this purpose, but you will often find that some other column works as a key for the purposes of joining two or more tables.

A basic two-table join will have the general form of:

SELECT {column1}, {column2}, {column3}  {columnN}
FROM {table1}
JOIN {table2}
ON {table1}.{key} = {table2}.{key}

In this example {key} will be the name of the column that you will be using as a join key. The name of the column may be different between tables–the important thing is that the underlying data matches. As a hypothetical example, suppose that you were trying to join sales and products. In the sales table, products are listed only by their product_id. In products, the corresponding column is named simply id. In that case, the ON statement from the above query would look like:

sales.product_id = products.id

Twitter example

In order to tie everything together, let’s look at a specific example of a join that covers all aspects of the Panoply data collection and integration process that we’ve covered so far. If you collect your Twitter data with subtable creation enabled, you’ll find that each of the main tables–tweets, likes and followers–will spawn multiple subtables. This is helpful in many cases, as specific data from table rows will be broken out into relatable tables.

So let’s suppose you wanted to break out each individual hashtag attached to a specific tweet. You would need to join twitter_tweets to twitter_tweets_entities_hashtags. Here’s how you’d do that:

SELECT twitter_tweets."text" as "tweet", twitter_tweets_entities_hashtags."text" as "tag"
FROM twitter_tweets
JOIN twitter_tweets_entities_hashtags
ON twitter_tweets.id = twitter_tweets_entities_hashtags.twitter_tweets_entities_id

You can add whatever tweaks and permutations you’d like, but that’s the basic query that you should start with.

Getting started is easy! Get all your data in one place in minutes.
Try Free