What is the deduplication algorithm used for Twitter retweets?


I would like to know more about the information in a Twitter retweet. Are only “official retweets” with no additional information added by the retweeter included as duplicates, or are retweets that also contain a comment by the retweeter included? Additionally, if someone does not do an “official” retweet but copies and pastes the text of a Tweet and posts it, will this be treateded as a retweet?


The metadata field retweet is the official Twitter record.  See screenshot below.

The duplicates in our system are based on the identical text in the Tweet content. Alternatively the near-duplicates might be what you are looking for. retweet_metadata.png


Have more questions? Submit a request


Powered by Zendesk