This document provides instructions for integrating Google BigQuery data into Panoply. The following items will be covered:
BigQuery Data Integration
Panoply’s BigQuery data source can import all the projects, datasets, and tables in a BigQuery database.
To integrate Google BigQuery data into Panoply using default selections, complete the following steps. For more advanced options, complete the following and refer to the subsequent sections for detailed information.
- Click Data Sources in the navigation menu.
- Click the Add Data Source button.
- In the Data Sources – Choose Source Type window, select BigQuery. BigQuery is listed under Databases.
- In the Data Sources – BigQuery screen, click Login.
- In the Google prompt, select the Google account tied to the data you would like to add to Panoply.
- A Google Account Authorization window confirms whether you want to allow Panoply to access your Google BigQuery data. Click Allow to continue.
- In the Data Sources – BigQuery screen, click Next.
- Select the tables to import from your projects. Tables appear in the format <project>:<dataset>.<table>.
- (Optional) To customize the ingestion from your data source, review the advanced options.
- Click Collect.
The Data Sources – BigQuery window will appear grayed out while the data integration is pending. A small green progress bar appears below BigQuery.
You will be prompted to set up the integration of another data source. You can set up multiple data integrations without impacting the ingestion of the already scheduled or pending data integrations.
From the Data Sources main menu, you can monitor the data ingestion status of the scheduled and pending data integrations. After the data ingestion is complete, you can clean or transform your data in the Tables menu.
Clicking Show next to Advanced will expand the Data Sources - BigQuery window to include Destination, Primary Key, Incremental Key, Exclude, Parse string, and Truncate table.
Destination - Default is
__tablenameis an underscore-separated concatenation of the source BigQuery project, dataset, and table names. In other words, the destination in Panoply is
Primary Key - Default is
id. The primary key here determines which field(s) to use as the deduplication key when ingesting data.
Incremental Key - By default, Panoply fetches all of your BigQuery data on each run. If you only want to collect some of your data, enter a column name to use as your incremental key. The column must be logically incremental. Panoply will keep track of the maximum value reached during the previous run and will start there on the next run. If you set an incremental key, you should only select one table. Otherwise the collection will use a single set of incremental key & value for all the tables. Select the checkbox if the incremental key is an ISO date data type.
The data schema in Panoply will reflect the data schema from the BigQuery source. Additionally, Panoply includes these metadata columns in each table: