Archives, buckets, and datasets


Principle Project Parts

Every project in DiscoverText has three fundamental components:

  • Archives: collections of raw data.
  • Buckets: sub-sets of raw data.
  • Datasets: data humans can code and machines can classify.

Raw data archives can come from a variety of sources including uploaded spreadsheets, large email collections, or live social media feeds. 

Buckets are produced using a variety of tools, including search, filters, coding, de-duplication, clustering, and machine classification. Buckets are refined archives.

Anything you can get on a list view you can put in a bucket. Any duplicate group or set of groups can go in a bucket.

Datasets can be coded (labeled or tagged) by one or more DiscoverText users. They can also be machine classified using our "sifter" technology.


