Understanding the language codes in Twitter tweets


Some (but not all) Twitter tweets contain metadata for the language that the tweet is written in. Tweets can be selected using the lang: operator. For example, the rule:

(1916rising OR 1916centenary OR ireland1916) lang:en

will return only those tweets containing any of the specified keywords and that are in the English language.

The documentation for Twitter’s Premium Operators describes the lang: operator and contains a list of language identifiers.


Note: The language classification is determined by Twitter, not DiscoverText. If no language classification is available then the provided result is ‘und’ (for undefined).


See these Helpful Tips for Selecting Specific Twitter Tweets to Import for information regarding geo data (place and country codes).



