Text analytics are computer-assisted techniques for reaching valid and reliable insights about a collection or stream of unstructured (free form) text. We think of all text analytics as, in some sense, filters that make big piles of text into smaller, much better organized and highly accessible piles of text. In particular, there are five major approaches that we embrace:
- Filtering on metadata
- De-duplication and near-duplicate clustering
- Human coding (labelling or tagging)
- Machine classification based on human coding
The tools require software users (analysts, researchers, domain experts) who embrace the theory and practice of text classification. There is an indispensable role for a human-in-the-loop for our form of supervised machine-learning.