Why is an inter-rater reliability kappa score so low?


I have had my coders code Twitter tweets. Unfortunately, the kappa score is only 0.51 when there is 99% agreement. I expected the kappa score to be higher? Is 99% not high enough to raise the kappa score to > 0.8? Please see the table below for more information.


You will need to code an equal number of items.


Have more questions? Submit a request


Powered by Zendesk