> For the complete documentation index, see [llms.txt](https://docs.datasaur.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.datasaur.ai/workspace-management/analytics/inter-annotator-agreement/krippendorffs-alpha-calculation.md).

# Krippendorff's Alpha Calculation

## Sample data

Suppose there are 2 labelers and 1 reviewer: Labeler A, Labeler B, and Reviewer, who labeled the same spans. Labeler A's work is visualized in image 1, Labeler B's work is visualized in image 2, and Reviewer's work is visualized in image 3.

![Labeler A's work](/files/vzeXRZAUFjuwCMMLQN6C)

![Labeler B's work](/files/sxVnm8hCNDJBh3Fl2AcE)

![Reviewer's work](/files/YeKsPDlk3nry560V27aB)

## Calculate the agreement

In this section, we will see the calculation details between Labeler A and Reviewer.

### 1. Arrange the data

First, we need to arrange the sample data into a table for better visualization.

| Span                  | Labeler A | Reviewer |
| --------------------- | --------- | -------- |
| The Tragedy of Hamlet | EVE       | TITLE    |
| Prince of Denmark     | PER       |          |
| Hamlet                | PER       | PER      |
| William Shakespeare   | PER       | PER      |
| 1599                  | YEAR      | YEAR     |
| 1601                  | YEAR      | YEAR     |
| Shakespeare           | ORG       | PER      |
| 30557                 |           | QTY      |

### 2. Clean the data

Second, spans with only have 1 label (`Prince of Denmark` and `30557`) should be removed, as they can introduce errors in the calculation. After cleaning, the results still reflect the agreement level between the two annotators. The cleaned data is shown in the table below.

<table><thead><tr><th>Span</th><th>Labeler A</th><th>Reviewer</th><th data-hidden>Reviewer</th></tr></thead><tbody><tr><td>The Tragedy of Hamlet</td><td>EVE</td><td>TITLE</td><td>TITLE</td></tr><tr><td>Hamlet</td><td>PER</td><td>PER</td><td>PER</td></tr><tr><td>William Shakespeare</td><td>PER</td><td>PER</td><td>PER</td></tr><tr><td>1599</td><td>YEAR</td><td>YEAR</td><td>YEAR</td></tr><tr><td>1601</td><td>YEAR</td><td>YEAR</td><td>YEAR</td></tr><tr><td>Shakespeare</td><td>ORG</td><td>PER</td><td>PER</td></tr></tbody></table>

### 3. Create the agreement table

Third, we need to create an agreement table based on the cleaned data. The table is visualized in Table 3.

<figure><img src="/files/XxDX3m71bKluxUSJDsCG" alt=""><figcaption><p>Table 3. Agreement table</p></figcaption></figure>

Based on the table, 5 values are calculated: $$n$$, $$r\_i$$, $$r\_k$$, $$r$$, and $$r'$$.

#### Total spans in the data

* $$n$$ is the total spans in the data.
  * Here, $$n=6$$ because there are 6 spans.

#### Total labels in each span

$$
r\_i=\sum\limits\_{k=1}^{m}r\_{ik} (1)
$$

* $$r\_i$$ is the total labels that span $$i$$ has.
* $$m$$ is the total number of label.
  * Here, $$m=5$$ because there are 5 labels.
* $$r\_{ik}$$ is the number of $$k$$ label in span $$i$$.

Here is the calculation result.

* $$r\_1=r\_{1,EVE}+r\_{1,ORG}+r\_{1,PER}+r\_{1,TITLE}+r\_{1,YEAR}=1+0+0+1+0=2$$
* $$r\_2=r\_{2,EVE}+r\_{2,ORG}+r\_{2,PER}+r\_{2,TITLE}+r\_{2,YEAR}=0+0+2+0+0=2$$
* $$r\_3=r\_{3,EVE}+r\_{3,ORG}+r\_{3,PER}+r\_{3,TITLE}+r\_{3,YEAR}=0+0+2+0+0=2$$
* $$r\_4=r\_{4,EVE}+r\_{4,ORG}+r\_{4,PER}+r\_{4,TITLE}+r\_{4,YEAR}=0+0+0+0+2=2$$
* $$r\_5=r\_{5,EVE}+r\_{5,ORG}+r\_{5,PER}+r\_{5,TITLE}+r\_{5,YEAR}=0+0+0+0+2=2$$
* $$r\_6=r\_{6,EVE}+r\_{6,ORG}+r\_{6,PER}+r\_{6,TITLE}+r\_{6,YEAR}=0+1+1+0+0=2$$

#### Total of each label

$$
r\_k=\sum\limits\_{i=1}^{n}r\_{ik} (2)
$$

* $$r\_k$$ is the total of $$k$$ label in the data.
* $$n$$ is the total spans in the data.
* $$r\_{ik}$$ is the number of $$k$$ label in span $$i$$.

Here is the calculation result.

* $$r\_{EVE}=r\_{1,EVE}+r\_{2,EVE}+r\_{3,EVE}+r\_{4,EVE}+r\_{5,EVE}+r\_{6,EVE}=1+0+0+0+0+0=1$$
* $$r\_{ORG}=r\_{1,ORG}+r\_{2,ORG}+r\_{3,ORG}+r\_{4,ORG}+r\_{5,ORG}+r\_{6,ORG}=0+0+0+0+0+1=1$$
* $$r\_{PER}=r\_{1,PER}+r\_{2,PER}+r\_{3,PER}+r\_{4,PER}+r\_{5,PER}+r\_{6,PER}=0+2+2+0+0+1=5$$
* $$r\_{TITLE}=r\_{1,TITLE}+r\_{2,TITLE}+r\_{3,TITLE}+r\_{4,TITLE}+r\_{5,TITLE}+r\_{6,TITLE}=1+0+0+0+0+0=1$$
* $$r\_{YEAR}=r\_{1,YEAR}+r\_{2,YEAR}+r\_{3,YEAR}+r\_{4,YEAR}+r\_{5,YEAR}+r\_{6,YEAR}=0+0+0+2+2+0=4$$

#### Total labels in the data

$$
r=\sum\limits\_{i=1}^nr\_i (3)
$$

* $$r$$ is the total labels in the data.
* $$n$$ is the total spans in the data.
* $$r\_i$$ is the total labels that span $$i$$ has.

Here is the calculation result.

* $$r=r\_1+r\_2+r\_3+r\_4+r\_5+r\_6=12$$

#### Average number of labels per span

$$
r'=\frac{r}{n} (4)
$$

* $$r'$$ is the average number of labels per span.
* $$n$$ is the total spans in the data.

Here is the calculation result.

* $$r'=\frac{r}{n}=\frac{12}{6}=2$$

### 4. Choose weight function

Fourth, we need a weight function to weight the labels. Every label is treated equally because one label is no different from the other. Hence, the weight function that will be used is stated in Formula (5).

$$
w\_{ik}=r\_{ik} (5)
$$

* $$w\_{ik}$$ is the weighted number of $$k$$ label in span $$i$$.
* $$r\_{ik}$$ is the number of $$k$$ label in span $$i$$.

### 5. Calculate Pa

Fifth, the observed weighted percent agreement is calculated.

#### Weighted number of labels

We will start by calculating the weighted number of labels using Formula (6).

$$
r\_{ik+}=\sum\limits\_{l=1}^{m} w\_{kl}r\_{il} (6)
$$

* $$r\_{ik+}$$ is the weighted number of $$k$$ label in span $$i$$.
* $$m$$ is the total number of label.
* $$w\_{kl}$$ is the weighted number of $$l$$ label in span $$k$$.
* $$r\_{il}$$ is the number of $$l$$ label in span $$i$$.

For example, we can apply Formula (6) to calculate the weighted EVE label in span 1.

$$
r\_{1,EVE+}=\sum\limits\_{l=1}^{5} w\_{EVE,l}r\_{1,l}=1*1+0*0+0*0+0*1+0\*0=1
$$

We need to calculate all the span and label combinations. The complete calculation result is visualized in Table 4.

<figure><img src="/files/6EDNotBvzOLbOXXOzEQ9" alt=""><figcaption><p>Table 4. Weighted number of labels</p></figcaption></figure>

#### Agreement percentage

After we got the weighted number of labels, we need to calculate the agreement percentage for a single span and label using Formula (7).

$$
p\_{a|ik}=\frac{r\_{ik}(r\_{ik+}-1)}{r'(r\_i-1)} (7)
$$

* $$p\_{a|ik}$$ is the agreement percentage of $$k$$ label in span $$i$$.
* $$r\_{ik}$$ is the number of $$k$$ label in span $$i$$.
* $$r\_{ik+}$$ is the weighted number of $$k$$ label in span $$i$$.
* $$r'$$ is the average number of labels per span.
* $$r\_i$$ is the total labels that span $$i$$ has.

For example, we can apply Formula (7) to calculate the agreement percentage of EVE label in span 1.

$$
p\_{a|1,EVE}=\frac{r\_{1,EVE}(r\_{1,EVE+}-1)}{r'(r\_1-1)}=\frac{1(1-1)}{2(2-1)}=0
$$

We need to calculate all the span and label combinations. The complete calculation result is visualized in Table 5.

<figure><img src="/files/0vK5n1lNG6khypF8318n" alt=""><figcaption><p>Table 5. Agreement percentage</p></figcaption></figure>

#### Agreement percentage of a single span

We can simplify the result by getting the agreement percentage of a single span using Formula (8).

$$
p\_{a|i}=\sum\limits\_{k=1}^{m} p\_{a|ik} (8)
$$

* $$p\_{a|i}$$ is the agreement percentage of span $$i$$.
* $$m$$ is the total number of label.
* $$p\_{a|ik}$$ is the agreement percentage of $$k$$ label in span $$i$$.

For example, we can apply Formula (8) to calculate the agreement percentage of span 1.

$$
p\_{a|1}=\sum\limits\_{k=1}^{5} p\_{a|1,k}=0+0+0+0+0=0
$$

We need to calculate the agreement percentage of all spans. The complete calculation result is visualized in Table 6.

<figure><img src="/files/vssP8gna0OkrKmWK9Gyu" alt=""><figcaption><p>Table 6. Agreement percentage of eacpan</p></figcaption></figure>

#### Average agreement percentage

From the previous calculation, we can calculate the average agreement percentage using Formula (9).

$$
p\_a'=\frac{1}{n}\sum\limits\_{i=1}^{n}P\_{a|i} (9)
$$

* $$p\_a'$$ is the average agreement percentage.
* $$n$$ is the total spans in the data.
* $$p\_{a|i}$$ is the agreement percentage of span $$i$$.

We can apply Formula (9) to calculate the average agreement percentage.

$$
p\_a'=\frac{1}{6}\sum\limits\_{i=1}^{6}P\_{a|i}=\frac{1}{6}(0+1+1+1+1+0)=0.6666
$$

#### Calculate Pa

Finally, the observed weighted percent agreement is calculated using Formula (10).

$$
p\_a=p\_a'(1-\frac{1}{nr'})+\frac{1}{nr'} (10)
$$

* $$p\_a$$ is the observed weighted percent agreement.
* $$p\_a'$$ is the average agreement percentage.
* $$n$$ is the total spans in the data.
* $$r'$$ is the average number of labels per span.

We can apply Formula (10) to calculate the observed weighted agreement percentage.

$$
p\_a=p\_a'(1-\frac{1}{nr'})+\frac{1}{nr'}=0.6666(1-\frac{1}{6\times2})+\frac{1}{6\times2}=0.6944
$$

### 6. Calculate Pe

Sixth, the chance weighted percent agreement is calculated.

#### Classification probability

We start by calculating the classification probability for each label using Formula (11).

$$
\pi\_k=\frac{r\_k}{r} (11)
$$

* $$\pi\_k$$ is the classification probability for $$k$$ label.
* $$r\_k$$ is the total of $$k$$ label in the data.
* $$r$$ is the total labels in the data.

Here is the calculation result.

* $$\pi\_{EVE}=\frac{r\_{EVE}}{r}=\frac{1}{12}=0.0833$$
* $$\pi\_{ORG}=\frac{r\_{ORG}}{r}=\frac{1}{12}=0.0833$$
* $$\pi\_{PER}=\frac{r\_{PER}}{r}=\frac{5}{12}=0.4166$$
* $$\pi\_{TITLE}=\frac{r\_{TITLE}}{r}=\frac{1}{12}=0.0833$$
* $$\pi\_{YEAR}=\frac{r\_{YEAR}}{r}=\frac{4}{12}=0.3333$$

#### Calculate Pe

To calculate the chance weighted percent agreement, Formula (11) can be applied to Formula (12).

$$
p\_e=\sum\limits\_{k=1}^{m}{\pi\_k}^2 (12)
$$

* $$p\_e$$ is the chance weighted percent agreement.
* $$m$$ is the total number of label.
* $$\pi\_k$$ is the classification probability for $$k$$ label.

Here is the chance weighted percent agreement calculation.

$$p\_e=\sum\limits\_{k=1}^{m}{\pi\_k}^2$$

$$p\_e={\pi\_{EVE}}^2+{\pi\_{ORG}}^2+{\pi\_{PER}}^2+{\pi\_{TITLE}}^2+{\pi\_{YEAR}}^2$$

$$p\_e=0.0833^2+0.0833^2+0.4166^2+0.0833^2+0.3333^2$$

$$p\_e=0.3055$$

### 7. Calculate the Alpha

Finally, Krippendorff's alpha is calculated using Formula (13).

$$
\alpha=\frac{p\_a-p\_e}{1-p\_e} (13)
$$

* $$\alpha$$ is the Krippendorff's alpha between Labeler A and Reviewer.
* $$p\_a$$ is the observed weighted percent agreement.
* $$p\_e$$ is the chance weighted percent agreement.

We can get the $$\alpha$$ by applying $$p\_a$$ and $$p\_e$$ to Formula (13).

$$
\alpha=\frac{p\_a-p\_e}{1-p\_e}=\frac{0.6944-0.3055}{1-0.3055}=0.56
$$

## Summary

* We apply the same calculation for agreement between labelers and between the reviewer and labelers.
* Missing labels from a single labeler will be removed.
* Chance agreement depends on:
  * The number of labels in a project.
  * The number of label options.
* When both labelers agree but the reviewer rejects the labels:
  * The agreement between the two labelers increases.
  * The agreement between the labelers and the reviewer decreases.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/workspace-management/analytics/inter-annotator-agreement/krippendorffs-alpha-calculation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
