Classify large volume of data quickly

Follow

In DiscoverText you can create large archives of a variety of social media.  There may be occasions when a search term produces a high volume of results.  In these cases, there are many ways to classify a large volume of data quickly and easily.  The following are some DiscoverText features to keep in mind:

  1. Creating a first-level coding scheme (code set) will allow the easy and quick coding of a few hundred items at a time.
  2. The ActiveLearning engine can continuously utilize the most recent human coding choices and training data in results.  Up to a hundred items at a time can be classified in one click. 
    • Right click on a dataset folder and select view details to reach the Dataset Details page.  Select the Classification Report option from the ActiveLearning section.
  3. Classification schemes will become more precise and easier to implement over time through rebuilding the entire classification set and iteration.  (See Tip #5 in Tips for Building Effective Classifiers
  4. The end goal is to obtain a high-level view of the classification of your  data sets.  By researching the slices of the pie chart and by filtering meta data values and timeline values it is possible to find interesting information in what was originally a voluminous data set.
    • The Coding Summary Report (also viewable as a pie chart) can be accessed by selecting the Coding Report link in the Dataset Coder section of the Dataset Details page.  
  5. The Split feature allows data to be split by human coding, the winning classifications or a variable user-defined threshold.  This tool is useful for iterating across the development of a social sifter. 
    • This is available in the Dataset Options section on the right of the Dataset Details page.
  6. Core tools for measuring coder reliability are available and take the research process and development of classifiers to the next level. 
    • This is available in the Analysis section on the right of the Dataset Details page.
  7. Crucial to the TextSifter approach of customized machine classifiers is the idea that only the differences on certain items need to be adjudicated to add maximum value to the human training factor in custom classification. 
    • The Adjudication Report option is available in the Analysis section on the right of the Dataset Details page.
  8. Once a classifier has been trained and applied, the Classification Report option can be used as a threshold sensitive filers, where the classifier is further refined by only taking items off a certain threshold. 
    • The Classification Report option is available in the ActiveLearning section of the Dataset Details page.
Have more questions? Submit a request

Comments

Powered by Zendesk