Twitter

Twitter

This document provides step-by-step instructions for integrating Twitter data into Panoply. The following items will be covered:

Twitter Data Integration

To integrate Twitter data into Panoply using default selections, complete the following steps. For more advanced options, complete the following and refer to the subsequent sections for detailed information.

  1. Click Data Sources in the navigation menu.
  2. Click the Add Data Source button.
  3. In the Data Sources - Choose Source Type window, select Twitter. Twitter is listed under APIs.
  4. In the Data Sources – Twitter screen, click Login.
  5. A Twitter dialog box confirms whether you want to allow Panoply to use your account. Click Authorize App.
  6. In the Data Sources – Twitter, screen, select which data to import.
  7. (Optional) To Customize the ingestion from your data source, review the advanced options.
  8. Click Collect.

The Data Sources – Twitter window will go gray while the data integration is pending. A small green progress bar will appear below Twitter once the integration has begun. A prompt will appear asking if you would like to set up the integration of another data source. Multiple data integrations can be set up without impacting the ingestion of the already scheduled or pending data integrations.

From the Data Sources main menu you can monitor the data ingestion status of the scheduled and pending data integrations. After the data ingestion is complete, you can clean or transform your data in the tables menu.

Advanced Options

Clicking Show next to Advanced will expand the Data Sources - Twitter window to include Destination, Primary Key, Exclude, Parse String and Truncate table

Destination - Default is twitter_{__tablename}, where __tablename is the table name from the schema for this data source. See Data Schema for more detail about each table. Primary Key - Default is id. The primary key here determines which field(s) to use as the deduplication key when ingesting data.

Data Schema

The Twitter data elements ingested by Panoply are listed below. Additional detail is available in the Twitter API documentation.

Tweets - a collection of tweets from the authorized user collected from the https://api.twitter.com/1.1/statuses/user_timeline API endpoint. For additional information, see GET statuses/user_timeline in the Twitter API documentation.

Likes - a collection of the authorizing user’s likes (formerly known as ‘favo(u)rites’ or ‘stars’) sourced from the https://api.twitter.com/1.1/favorites/list API endpoint. For additional information, see GET favorites/list in the Twitter API documentation.

Followers - a collection of information on the authorizing user’s followers collected from the https://api.twitter.com/1.1/followers/list API endpoint. For additional information, see GET followers/list in the Twitter API documentation.

Nested data - The fundamental unit in Twitter is the tweet object, which is a nested JSON object containing several main attributes (id, created_at, and text) and several child objects (user, entities, extended_entities, and place). Panoply’s default behavior is to transform nested data into a set of many-to-many or one-to-many relationship tables. The nested data is thereby turned into a (large) collection of flat tables that can be joined together as needed.

Entities - The entities object in a tweet object will contain all additional metadata and contextual information, as well as content included in tweets other than the tweet itself. Hashtags, user mentions, stock symbols, Twitter polls, and posted images/videos. Tables with the _entities suffix will contain all relevant information, which can be joined to other tables using the id field.

Internal fields - In addition to the data schema details noted below, Panoply creates __updatetime and __senttime internal fields on all tables.

Tweets

As noted above, tweet data is collected from the https://api.twitter.com/1.1/statuses/user_timeline API endpoint. In Panoply, the default tweets table is twitter_tweets and includes the following fields:

Column Data Type Description
id Text The unique identifier of the tweet
truncated Number Boolean indicating whether tweet was truncated
text Text The text of the tweet itself
is_quote_status Number Boolean indicating whether tweet is a quoted tweet
favorite_count Number The number of likes this tweet received
source Text String describing the source from which the tweet was posted (e.g. web)
quoted_status_id Number The unique identifier of the quoted tweet
retweeted Number Boolean indicating whether tweet was retweeted by user
retweet_count Number The number of times this tweet has been retweeted
favorited Number Boolean indicating whether tweet has been liked by user
lang Text BCP 47 language identifier - the language the tweet was written in
created_at Text UTC time at which tweet was created
quoted_status_id_str Number String representation of ID of quoted tweet
in_reply_to_status_id Number Only generated if tweet is a reply. Integer representation of replied-to tweet ID
in_reply_to_screen_name Text Screen name of author of replied-to tweet
in_reply_to_user_id Number User ID of author of replied-to tweet
in_reply_to_user_id_str Number String representation of user ID of author of replied-to tweet
in_reply_to_status_id_str Number String representation of ID of replied-to tweet
possibly_sensitive Number Boolean indicating whether a URL in the tweet might be a link to sensitive content

Likes

As noted above, likes data is collected from the https://api.twitter.com/1.1/favorites/list API endpoint. These tables contain data related to outgoing likes/favorites/stars made by authenticating user. In Panoply, the default likes table is twitter_likes and includes the following fields:

Column Data Type Description
id Text The unique identifier of the liked tweet
possibly_sensitive Number Boolean indicating whether a URL in the tweet might be a link to sensitive content
truncated Number Boolean indicating whether tweet was truncated
text Text The text of the liked tweet
is_quote_status Number Boolean indicating whether liked tweet is a quoted tweet
favorite_count Number The number of likes liked tweet received
source Text String describing the source from which the tweet was posted (e.g. web)
quoted_status_id ID of quoted tweet  
retweeted Number Boolean indicating whether tweet was retweeted by user
retweet_count Number The number of times liked tweet has been retweeted
favorited Number Boolean indicating whether tweet has been liked by user
lang Text BCP 47 language identifier - the language the liked tweet was written in
created_at Text UTC time at which tweet was created
quoted_status_id_str Number String representation of ID of quoted tweet
in_reply_to_status_id Number Only generated if liked tweet is a reply. Integer representation of replied-to tweet ID
in_reply_to_screen_name Text Screen name of author of replied-to tweet
in_reply_to_user_id Number User ID of author of replied-to tweet
in_reply_to_user_id_str Number String representation of user ID of author of replied-to tweet
in_reply_to_status_id_str Number String representation of ID of replied-to tweet

Followers

As noted above, follower data is collected from the https://api.twitter.com/1.1/followers/list API endpoint in combination with https://api.twitter.com/1.1/users/show and https://api.twitter.com/1.1/users/lookup. These tables contain data related to the authenticating user’s followers and are generated from Twitter user objects. More information on user objects can be found in the Twitter API documentation In Panoply, the default followers table is twitter_followers and contains the following fields:

Column Data Type Description
id Text Unique ID of user
location Text Location of user listed in profile
follow_request_sent Number Boolean indicating whether request to follow has been sent
has_extended_profile Nunber Boolean indicating whether user has extended profile
profile_use_background_image Number Boolean indicating follower’s preference on background image use
default_profile_image Number Boolean indicating whether user has uploaded profile image
profile_background_image_url_https Text URL of user’s background image
verified Number Boolean indicating whether user is a verified user
translator_type Text Deprecated feature related to Twitter’s translator community
profile_text_color Text Hexadecimal color user has chosen for text display in their UI
profile_image_url_https Text URL of user’s profile image
profile_sidebar_fill_color Text Hexadecimal color code user has chosen for sidebar color in their UI
followers_count Number Count of followers user has
profile_sidebar_border_color Text Hexadecimal color code user has chosen for sidebar border color
profile_background_color Text Hexadecimal color code user has chosen for background
listed_count Number Number of public lists user is a member of
is_translation_enabled Number Boolean indicating whether user is translation enabled
statuses_count Number Number of tweets user has posted
description Text User-uploaded bio
friends_count Number Total number of users this user is following
profile_link_color Text Hexadecimal color code of user’s profile link
profile_image_url Text URL of user’s profile image
following Number Boolean indicating authenticating user is following this user (deprecated)
geo_enabled Number Boolean indicating whether user has enabled geotagging
profile_banner_url Text URL of user’s uploaded profile banner
profile_background_image_url Text URL of user’s background image
screen_name Text User’s screen name/handle/alias. Typical max length 15 characters
lang Text BCP 47 language identifier - the language the user primarily uses
profile_background_tile Number Boolean indicating whether background image should be tiled
favourites_count Number Number of tweets user has liked. Note british spelling of ‘favourites’
name Text The display name of user. Subject to change by user, usually limited to 20 characters
notifications Number Boolean indicating whether user wants to receive notifications by SMS. Deprecated.
url Text URL provided by user for display in profile
created_at Text UTC time of account’s creation
contributors_enabled Number Boolean indicating whether account’s tweets can be authored by multiple users
protected Number Boolean indicating whether user has chosen to protect their tweets
default_profile Number Boolean indicating whether user has made any changes to theme of profile
is_translator Number Boolean indicating whether user is a member of Twitter’s translator community. Deprecated.