Jupyter on Redshift
  • 08 Jul 2021
  • 1 Minute to read
  • Dark
    Light
  • PDF

Jupyter on Redshift

  • Dark
    Light
  • PDF

Article Summary

Panoply on Redshift

To connect to your Panoply data warehouse using Python and a Jupyter notebook:

  1. Install SQLAlchemy and pandas in your Python environment:

     pip install sqlalchemy pandas
    

    or if conda is your package manager of choice:

     conda install sqlalchemy pandas
    
  2. Create a new Jupyter Notebook and import the packages you'll need:

     import pandas as pd
     import sqlalchemy
     from sqlalchemy import create_engine
    
  3. Enter the connection parameters for your Panoply data warehouse:

     POSTGRES_ADDRESS = 'db.panoply.io' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
     POSTGRES_PORT = '5439'
     POSTGRES_USERNAME = 'username' ## CHANGE THIS TO YOURPANOPLY/POSTGRES USERNAME
     POSTGRES_PASSWORD = '*****' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD
     POSTGRES_DBNAME = 'database' ## CHANGE THIS TO YOUR DATABASE NAME
    
     postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,
        password=POSTGRES_PASSWORD,
        ipaddress=POSTGRES_ADDRESS,
        port=POSTGRES_PORT,
        dbname=POSTGRES_DBNAME))
    
  4. Initiate the connection to your Panoply data warehouse:

     cnx = create_engine(postgres_str)
    
  5. Collect your data into a pandas dataframe:

     df = pd.read_sql_query('''SELECT * FROM tablename''', cnx)
    

Note that the above example will import the entire table into memory, so take care to create an SQL query that will pull only the data you want to analyze.


Was this article helpful?

What's Next