Classification is a complementary process to manual coding; after you code part of a dataset, you can run classification on it. The classifier, an advanced language technology, calculates the probability that each item belongs to each code in the code set. An enterprise license is required to create and use classifiers, but you can experiment with the tool in the 30-day free trial.
A classifier is based on a code set with training data. The training and development of a classifier is an iterative process that closely resembles spam filtering. As you use the same code set for different datasets, it continuously accumulates training data, which the classifier then uses in its calculations.
If you select a global code set for classification, you get a copy of the code set and training data. As you use this code set, the accumulated training data is specific to your account and does not affect the global code set.
Users can only classify datasets that they create and own. To classify a set of items, you should code part or all of a dataset with the code set even if it already has training data. For more accurate results, ensure that you assign each code to at least 100 items. Then you can run classification on all of dataset items. After the process is completed, you can view the results in the item view.
The real power of text classification is the ability to use the classification score as a filter. In the list view of a dataset that has been classified, you can use the advanced filtering capability to view items that are more or less likely to be in a particular category. In the example below, the user is creating a filter to see only items that are above 90% likely to be about Flyers hockey.
Another way to understand and use classification scores is to view the interactive custom machine classifier histogram. To generate the histogram, click the set classification boundary filter option in the advanced filters.