Explain how Datasaur implements the Krippendorff's Alpha algorithm.
Last updated
is one of the algorithms that is supported by Datasaur to calculate the agreement while taking into account the possibility of chance agreement. We will deep dive into how Datasaur collects all labels from labelers and reviewers in a project and process them into an Inter-annotator Agreement matrix.
Sample Data
Suppose there are 2 labelers and 1 reviewer — Labeler A, Labeler B, and Reviewer — who labeled the same spans. Labeler A work is visualized in Image 1, Labeler B work is visualized in Image 2, and Reviewer work is visualized in Image 3.
Calculating the Agreement
In this section, we will see the calculation detail between Labeler A and Reviewer.
1. Arranging the data
First, we need to arrange the sample data into Table 1 for the better visualization.
Table 1. Sample Data
Span
Labeler A
Reviewer
The Tragedy of Hamlet
EVE
TITLE
Prince of Denmark
PER
Hamlet
PER
PER
William Shakespeare
PER
PER
1599
YEAR
YEAR
1601
YEAR
YEAR
Shakespeare
ORG
PER
30557
QTY
2. Cleaning the data
Second, we need to remove spans that only have 1 label i.e. Prince of Denmark and 30557. They should be removed because spans with a single label will introduce a calculation error. The calculation result will still show the agreement level between 2 annotators. The cleaned data is shown in Table 2.
Table 2. Cleaned Data
Span
Labeler A
Reviewer
The Tragedy of Hamlet
EVE
TITLE
Hamlet
PER
PER
William Shakespeare
PER
PER
1599
YEAR
YEAR
1601
YEAR
YEAR
Shakespeare
ORG
PER
3. Creating the agreement table
Third, we need to create an agreement table based on the cleaned data. The table is visualized in Table 3.
Total spans in the data
Total labels in each span
Here is the calculation result.
Total of each label
Here is the calculation result.
Total labels in the data
Here is the calculation result.
Average number of labels per span
Here is the calculation result.
4. Choosing weight function
Fourth, we need a weight function to weight the labels. Every label is treated equally because one label is no difference than the other. Hence, the weight function that will be used is stated in Formula 5.
5. Calculating Pa
Fifth, the observed weighted percent agreement is calculated.
Weighted number of labels
We will start by calculating the weighted number of label using Formula (6).
For example, we can apply Formula (6) to calculate the weighted EVE label in span 1.
We need to calculate all of the span and label combination. The complete calculation result is visualized in Table 4.
Agreement percentage
After we got the weighted number of labels, we need to calculate the agreement percentage for a single span and label using Formula (7).
For example, we can apply Formula (7) to calculate the agreement percentage of EVE label in span 1.
We need to calculate all of the span and label combination. The complete calculation result is visualized in Table 5.
Agreement percentage of a single span
We can simplify the result by getting the agreement percentage of a single span using Formula (8).
For example, we can apply Formula (8) to calculate the agreement percentage of span 1.
We need to calculate the agreement percentage of all spans. The complete calculation result is visualized in Table 6.
Average agreement percentage
From the previous calculation, we can calculate the average agreement percentage using Formula (9).
We can apply Formula (9) to calculate the average agreement percentage.
Calculating Pa
Finally, the observed weighted percent agreement is calculated using Formula (10).
We can apply Formula (10) to calculate the observed weighted agreement percentage.
6. Calculating Pe
Sixth, the chance weighted percent agreement is calculated.
Classification probability
We start by calculating the classification probability for each label using Formula (11).
Here is the calculation result.
Calculating Pe
To calculate the chance weighted percent agreement, Formula (11) can be applied to Formula (12).
Here is the chance weighted percent agreement calculation.
7. Calculating the Alpha
Finally, Krippendorff's alpha is calculated using Formula (13).
Summary
We apply the same calculation for agreement between labelers, and between reviewer and labelers.
Missing labels from a single labeler will be removed.
The percentage of chance agreement will vary depending on:
The number of the labels in a project.
The number of label options.
When both labelers agree but the reviewer rejects the labels:
The agreement between the two labelers increases.
The agreement between the labelers and the reviewer decreases.
Based on the table, 5 values are calculated: n, ri​, rk​, r, and r′.