Google Drive

Google Drive

This document describes the Google Drive data source. Continue reading to learn more about:

  1. Collecting - what should you know about adding the data source.
  2. Data Dictionary - what data is available and how it is structured.

Collecting

NOTE: Google requires the logged-in user to have permissions to the data. If the permissions are not in place, some of the data will not be available.

NOTE: If you intend to schedule collection of a file in Google Drive, be aware that deleting the file from the Drive and re-uploading it will break the automated schedule. To update the file, simply upload the file with the same name and path to ensure that the ID assigned by Google Drive remains the same.

To configure this data source and collect Google Drive data:

  1. From the Data Sources menu, click Add Data Source.
  2. Search for Google Drive, then select that data source.
  3. Click Login and follow Google’s authorization process to allow Panoply to access Google Drive data.
  4. Select the files from which to collect data. Panoply will only show you files in your Drive that are compatible. Panoply supports the following file types:
    • tar, parsing with tarfile
    • zip, parsing with zipfile
    • xlsx, parsing with openpyxl
    • json, parsing with simplejson
    • csv, parsing with csv
    • tsv, parsing with csv
  5. (Optional) Set the Advanced Settings.
    • We do not recommend changing advanced settings unless you are an experienced Panoply user.
    • Destination:
      • Panoply selects a default destination. These are the tables where data is stored. The default naming convention is google_drive_<mydrive or foldername>_<filename>. For example if you had an .xlsx file named “App Install Metrics” in your root folder in Google Drive it would be stored in Panoply as google_drive_mydrive_app_install_metrics. If the same file is inside a folder named “Metrics” then it would be stored in Panoply as google_drive_metrics_app_install_metrics.
    • Primary Key - Users can define which column contains the table’s Primary Key. If this option is left blank and the sheet does not contain an ID column, Panoply will insert an id, formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2.
    • Truncate - Use truncate to delete any data collected previously, and then add new data to the same destination table(s) based on a new collection. This is useful when you don’t have a primary key and do not want to append rows to an existing data set.
  6. Click Save Changes then click Collect.
    • The data source appears grayed out while the collection runs.
    • You may add additional data sources while this collection runs.
    • You can monitor this collection from the Jobs page or the Data Sources page.
    • After a successful collection, navigate to the Tables page to review the data results.

Data Dictionary

Because Google Drive data comes from files Panoply cannot provide a data dictionary. But Panoply does automate the data schema for the collected data. This is the useful information to know about the Panoply automations:

  • A column in a table uses the same data type for all values in that column. Panoply automatically chooses the data type for each column based on the available values. This is important to note for this data source. If even one value in a column has text, then the entire column is considered data type Text.
    • For example, the following combination of values in a single column will be data type Number:
      • 10000
      • 10,000
      • 10.10
    • For example, the following combination of values in a single column will be data type Text:
      • 10000
      • 10,000
      • 10.10
      • 10000x
  • Dates are formatted as formatted strings.
  • For each sheet, Panoply opens the individual sheet (tab) and collects the values row by row.
  • A column with a header but without values will be ignored. This is a limitation built into the Data Engine.
  • Empty columns and empty rows are not collected.
  • The following metadata columns are added by Panoply to the destination table(s):
    • id - If the user does not enter a primary key, and no id column exists in the source, Panoply will insert an id. Formatted as a GUID, such as 2cd570d1-a11d-4593-9d29-9e2488f0ccc2
    • __updatetime - Formatted as a datetime, such as 2018-06-26T01:26:14.695Z
    • __senttime - Formatted as a datetime, such as 2018-06-26T01:26:14.695Z
    • __file - The name of the drive and file, including the file type, such as MyDrive_App Install Metrics.xlsx where the data originated.
    • __tablename - The name of the sheet (tab), where the data originated. Formatted as <drivename>_<filename>, such as MyDrive_App Install Metrics.`
    • __sheet - The name of the sheet/tab within the file.
Getting started is easy! Get all your data in one place in minutes.
Try Free