One of the core features of DiscoverText is a simple method to measure the inter-rater reliability when two or more coders complete the same coding task. This measurement is a building block of good research and an effective method to predict the likelihood of creating an effective custom machine classifier. As a rule, if two or more humans cannot agree regularly on a text classification task, there is not much chance a machine can be effectively trained.
- Create a dataset with 50-100 items using the random sample tool.
- Assign two or more peers to code the set using the default standard coding style.
- Code the dataset yourself and brief the coders on what you expect. Usually a short code book is prepared off system and shared in an email or attachment.
- After the coders complete their work, go to the Dataset Details page and click Standard Comparisons in the Analysis section of the page.
- Add all the coders who completed the task (in the Available Coders box) to the Chosen Coders box.
- Select the method of calculation (Fleiss' Kappa is the default).
- Optional: Select the check box if you also want to see all the two-way coder comparisons when there are three or more coders.
- Click the Run Comparison button to see the results.