Panoply on Redshift
To connect to your Panoply data warehouse using Python and a Jupyter notebook:
-
Install SQLAlchemy and pandas in your Python environment:
pip install sqlalchemy pandas
or if conda is your package manager of choice:
conda install sqlalchemy pandas
-
Create a new Jupyter Notebook and import the packages you'll need:
import pandas as pd import sqlalchemy from sqlalchemy import create_engine
-
Enter the connection parameters for your Panoply data warehouse:
POSTGRES_ADDRESS = 'db.panoply.io' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY POSTGRES_PORT = '5439' POSTGRES_USERNAME = 'username' ## CHANGE THIS TO YOURPANOPLY/POSTGRES USERNAME POSTGRES_PASSWORD = '*****' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD POSTGRES_DBNAME = 'database' ## CHANGE THIS TO YOUR DATABASE NAME
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,    password=POSTGRES_PASSWORD,    ipaddress=POSTGRES_ADDRESS,    port=POSTGRES_PORT,    dbname=POSTGRES_DBNAME))
-
Initiate the connection to your Panoply data warehouse:
cnx = create_engine(postgres_str)
-
Collect your data into a pandas dataframe:
df = pd.read_sql_query('''SELECT * FROM tablename''', cnx)
Note that the above example will import the entire table into memory, so take care to create an SQL query that will pull only the data you want to analyze.