In this article we will learn how to import Twitter Data from Gnip PowerTrack by constructing rules to refine the search results. Our goals are to select only the relevant tweets and to limit the number of tweets, which in turn minimizes the cost. We will start by reviewing the PowerTrack API rules, and then demonstrate a scenario of starting a data feed with step-by-step instructions. You may also find this video about the Gnip PowerTrack rules to be helpful.
Gnip PowerTrack Rules
Products such as DiscoverText use PowerTrack rules to deliver social data to you based on filtering rules you set up. Rules are made up of one or more ‘clauses,’ where a clause is a keyword, an exact phrase, or one of the many PowerTrack operators. Before beginning to build PowerTrack rules (see a link to the new PowerTrack 2.0 syntax at the bottom of this page), we will review the syntax, the list of available operators, and understand the restrictions around building rules. We will also examine how rules are logically evaluated.
- Multiple clauses can be combined with ‘and’, ‘or’, and 'not' logic. The rules for doing this are very specific:
- AND logic is specified with a space character between clauses. You cannot use the keyword AND (there is no such keyword).
- OR logic is specified with an uppercase OR.
- NOT logic is specified by prefixing the word with a minus sign. You cannot use the keyword NOT (there is no such keyword).
- An exact match of two or more words is accomplished by enclosing the words in quotation marks.
|Carson Nevada||Both Carson and Nevada must exist in the tweet.|
|“Ben Carson” OR primary||Either Ben Carson or primary must exist.|
- If you are searching for several keywords and receiving zero or only a few results (estimates), you may want to use the OR operator. For example, the rule:
(1916rising 1916centenary ireland1916 easterising)
will only return tweets that contain all of the keywords, which is very unlikely. Using the OR operator between each keyword:
(1916rising OR 1916centenary OR ireland1916 OR easterising)
will return tweets containing any one of the keywords.
- Oftentimes a popular tweet is retweeted (forwarded) by many Twitter users. We can ensure that only original tweets are included with the negation operator (minus sign) combined with the is:retweet rule.
|-is:retweet||Exclude retweets from the returned selection.|
Gnip PowerTrack interprets all uppercase letters as an operator like NOT or OR. If you have a word or acronym (e.g., NATO, IBM), you can either enter it in lowercase letters or enclose it in quotes.
Twitter’s available geo data begins September 1, 2013. Requesting a geo:has prior to September 1, 2013 is invalid.
There are many PowerTrack operators and rules. For example, "lang:es" means only return the tweets that have been classified by Gnip as being of the particular language Spanish (if, and only if, the activity has been classified). Please see the Gnip PowerTrack Rules documentation for more information.
Gnip uses a blank space between operands to signify a Boolean AND (remember, there is no “and” operator).
If you are searching for tweets from a particular user or users, then use the from: rule. Here are examples of searching for tweets from one or several users:
from:vote_leave OR from:leaveeuofficial OR from:primeminister
- We do not currently subscribe to the Gnip "profile" keywords (such as profile_country, profile_region, and profile_geo). Attempting to use them will return an error.
Scenario: Apple iPhone Encryption
A court order is intended to force Apple to provide the FBI (Federal Bureau of Investigation) with access to encrypted data on Apple’s iPhone cellphones, hopefully producing information about the shootings in San Bernardino, California on December 2, 2015.
This is a contentious issue balancing the competing interests of personal privacy and law enforcement investigations. Our research would like to see what the Twitterverse is saying about this topic.
Based upon recent mass media reports, some good search words are:
backdoor [access to an iPhone]
The first part of our PowerTrack rule is:
(encryption OR encrypted OR backdoor)
However, the first two search words are likely to be found in any tweet about encryption in general, and backdoor could be used in the context of home improvement advertising.
We can improve the likelihood of returning relevant search results from PowerTrack by specifying that one of the following words must also be present in the tweet: apple, iphone.
The second part of our PowerTrack rule is:
(apple OR iphone)
Finally, we only want original tweets, not retweets. The final part of our rule is:
The following instructions demonstrate how to collect tweets that fit this rule.
1. Create a project or open an existing one from the Projectlist in the Navigation Sidebar.
2. In the Project Optionssection, click Import Data.
3. In the Twittersection, click PowerTrack.
4. Create a new archive or select an existing archive, and then click Continue.
5. In the Fetch Rule field, type the Gnip PowerTrack rule, and then click Create Rule.
For this scenario, the rule is:
(encryption OR encrypted OR backdoor) (apple OR iphone) -is:retweet
6. DiscoverText will verify that the Gnip PowerTrack feed has been set up.
7. Monitor the progress of the data collection by clicking on the click here link, or click on GNIP Feed Management on the left left navigation under Tools.
8. The Running Feeds monitor displays how many tweets have been collected, and how many Gnip credits you have remaining. When finished collecting tweets, click the red X button located to the left of the Project name.
9. The collected tweets are found in the Data Archives in the archive you created in step 4.
Data starts flowing in immediately. The amount of data and rate at which new tweets are received is controlled by Twitter and how active Twitter users are. You may want to actively monitor the feed so you can stop the it before your credits are completely used up.
Although these rules can be very useful for selecting specific data, they can also severely restrict the quantity of data you may receive. For example, the place_country rule could eliminate 99% of the tweets because geo data is extremely rare in Twitter. If you find that the number of estimated activities is very low, consider removing a rule and reevaluating the results.
keyword: GNIP Power Track