Measure inter-rater reliability


One of the core features of DiscoverText is a simple method to measure the inter-rater reliability when two or more coders complete the same coding task. This measurement is a building block of good research and an effective method to predict the likelihood of creating an effective custom machine classifier. As a rule, if two or more humans cannot agree regularly on a text classification task, there is not much chance a machine can be effectively trained. 

  1. Create a dataset with 50-100 items using the random sample tool.
  2. Assign two or more peers to code the set using the default standard coding style.
  3. Code the dataset yourself and brief the coders on what you expect. Usually a short code book is prepared off system and shared in an email or attachment.
  4. After the coders complete their work, go to the Dataset Details page and click Standard Comparisons in the Analysis section of the page.standard_comparisons.jpg
  5. Add all the coders who completed the task (in the Available Coders box) to the Chosen Coders box.
  6. Select the method of calculation (Fleiss' Kappa is the default).
  7. Optional: Select the check box if you also want to see all the two-way coder comparisons when there are three or more coders.
  8. Click the Run Comparison button to see the results.
Have more questions? Submit a request


Powered by Zendesk