Archives, buckets, and datasets


Principle Project Parts

Every project in DiscoverText has three fundamental components:

  • Archives: collections of raw data.
  • Buckets: sub-sets of raw data.
  • Datasets: data humans can code and machines can classify.

Raw data archives can come from a variety of sources including uploaded spreadsheets, large email collections, or live social media feeds. 

Buckets are produced using a variety of tools, including search, filters, coding, de-duplication, clustering, and machine classification. Buckets are refined archives.

Datasets can be coded (labeled or tagged) by one or more DiscoverText users. They can also be machine classified using our "sifter" technology.


Have more questions? Submit a request


Powered by Zendesk