Cohen's Kappa Calculation

Explain how Datasaur implements the Cohen's Kappa algorithm.

Cohen's Kappaarrow-up-right is one of the algorithms that is supported by Datasaur to calculate the agreement while taking into account the possibility of chance agreement. This section explains how labels from labelers and reviewers are processed into an agreement matrix and used to compute Cohen’s Kappa.

Sample data

Suppose there are 2 labelers: Labeler A and Labeler B, who labeled the same sentences.

Labeler A
Labeler B

There is also a reviewer who labeled the same sentences.

Reviewer

Calculating the data

Agreement records

Based on the screenshots above, we map those labels into the agreement records below:

Agreement table/confusion matrix

The agreement records are then converted into a confusion matrix. For this example, the matrix is constructed using data from Labeler A and Labeler B.

Calculating the Kappa

From the matrix above, there are 7 records with 4 agreements.

The observed proportionate agreement is:

To calculate the probability of random agreement, we note that:

  • Labeler A labeled EVE once and Labeler B didn't label EVE. Therefore, the probability of random agreement on the label EVE is:

  • Compute the probability of random agreement for all labels:

The full random agreement probability is the sum of the probability of random agreement for all labels:

Finally, we can calculate the Cohen's Kappa:

The Kappa value for Labeler A and Labeler B is 0.49.

Kappa for Labeler A and Reviewer

With the same calculation, the Kappa value for Labeler A and the reviewer is 0.36.

Kappa for Labeler B and Reviewer

With the same calculation, the Kappa value for Labeler B and the reviewer is 0.475.

Summary

  • Missing labels from a labeler are treated as empty labels.

  • Chance agreement depends on:

    • The number of labels in a project.

    • The number of label classes.

  • When both labelers agree but the reviewer rejects the labels:

    • The agreement between the two labelers increases.

    • The agreement between the labelers and the reviewer decreases.

Last updated