Jupyter Notebook on Redshift

Jupyter Notebook on Redshift

Panoply documentation is moving! Click here to view the new site!  __________________________________________________________________________________________________________________

To determine whether your data warehouse is built on BigQuery or Redshift, see the BI Connect Page on the Panoply platform.

Panoply on Redshift

To connect to your Panoply data warehouse using Python and a Jupyter notebook:

  1. Install SQLAlchemy and pandas in your Python environment:

     pip install sqlalchemy pandas
    

    or

     conda install sqlalchemy pandas
    

    if conda is your package manager of choice.

  2. Create a new Jupyter Notebook and import the packages you’ll need:

     import pandas as pd
     import sqlalchemy
     from sqlalchemy import create_engine
    
  3. Enter the connection parameters for your Panoply data warehouse:

     POSTGRES_ADDRESS = 'db.panoply.io' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
     POSTGRES_PORT = '5439'
     POSTGRES_USERNAME = 'username' ## CHANGE THIS TO YOURPANOPLY/POSTGRES USERNAME
     POSTGRES_PASSWORD = '*****' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD
     POSTGRES_DBNAME = 'database' ## CHANGE THIS TO YOUR DATABASE NAME
    
     postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,
        password=POSTGRES_PASSWORD,
        ipaddress=POSTGRES_ADDRESS,
        port=POSTGRES_PORT,
        dbname=POSTGRES_DBNAME))
    
  4. Initiate the connection to your Panoply data warehouse:

     cnx = create_engine(postgres_str)
    
  5. Collect your data into a pandas dataframe:

     df = pd.read_sql_query('''SELECT * FROM tablename''', cnx)
    

Note that the above example will import the entire table into memory, so take care to create a SQL query that will pull only the data you want to analyze.

Getting started is easy! Get all your data in one place in minutes.
Try Panoply Free