Every project in DiscoverText has three fundamental components:
- Archives: collections of raw data.
- Buckets: sub-sets of raw data.
- Datasets: data humans can code and machines can classify.
Raw data archives can come from a variety of sources including uploaded spreadsheets, large email collections, or live social media feeds.
Buckets are produced using a variety of tools, including search, filters, coding, de-duplication, clustering, and machine classification. Buckets are refined archives.
Datasets can be coded (labeled or tagged) by one or more DiscoverText users. They can also be machine classified using our "sifter" technology.