Jupyter on Redshift

Panoply on Redshift

To connect to your Panoply data warehouse using Python and a Jupyter notebook:

Install SQLAlchemy and pandas in your Python environment:
```
 pip install sqlalchemy pandas
```
or if conda is your package manager of choice:
```
 conda install sqlalchemy pandas
```

Create a new Jupyter Notebook and import the packages you'll need:

 import pandas as pd
 import sqlalchemy
 from sqlalchemy import create_engine

Enter the connection parameters for your Panoply data warehouse:

 POSTGRES_ADDRESS = 'db.panoply.io' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
 POSTGRES_PORT = '5439'
 POSTGRES_USERNAME = 'username' ## CHANGE THIS TO YOURPANOPLY/POSTGRES USERNAME
 POSTGRES_PASSWORD = '*****' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD
 POSTGRES_DBNAME = 'database' ## CHANGE THIS TO YOUR DATABASE NAME

 postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,
    password=POSTGRES_PASSWORD,
    ipaddress=POSTGRES_ADDRESS,
    port=POSTGRES_PORT,
    dbname=POSTGRES_DBNAME))

Initiate the connection to your Panoply data warehouse:
```
 cnx = create_engine(postgres_str)
```

Collect your data into a pandas dataframe:

 df = pd.read_sql_query('''SELECT * FROM tablename''', cnx)

Note that the above example will import the entire table into memory, so take care to create an SQL query that will pull only the data you want to analyze.