# Inter-Annotator Agreement for Data Programming

**Data programming** empowers you to calculate the performance of your models and final answers using inter-annotator agreement (IAA) calculation.

To learn more about inter-annotator agreement, please visit [the following link](https://datasaurai.gitbook.io/datasaur/basics/workforce-management/analytics/inter-annotator-agreement).

Below are the steps to evaluate the performance.

1. Create a project
2. Activate **Data programming** through the **Manage extensions** dialog.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-374e899187897bc2fe2348652b51769f0fc330a2%2FExtension%20-%20Manage%20extensions%20-%20Data%20programming.png?alt=media" alt=""><figcaption><p>Activate Data Programming</p></figcaption></figure>
3. Create labeling functions for the selected question. You need a minimum of two labeling functions to obtain the IAA value.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-563aa7a93a5ed01585c26e18e24d18f629222b02%2FExtension%20-%20Data%20programming%20-%20manage%20LF%20-%20filled%20-%20prelabeled%20columns%20as%20models.png?alt=media" alt=""><figcaption><p>Data Programming Editor</p></figcaption></figure>

**Special notes**: If you use prelabeled columns as the representative of your models, you can create the labeling function like below :

* If using Snorkel provider, please use this code

  ```python
  @labeling_function()
  def labeling_function(x) -> int:
    # Implement your logic here
    text = x.columns[x.column_name_to_index['column_name']]
    for key, value in LABELS.items():
      if re.search(key, text, re.IGNORECASE):
        return value

    return ABSTAIN
  ```
* If using Stegosaurus provider, please use this code and activate **Multiple-label template**:

  ```python
  @target_label()
  ABSTAIN = -1
  def label_function(sample):
    text = sample['column_name']
    # text = sample[COLUMN_NAME] if only want to use content from certain column

    # Implement your logic here
    # Keywords value on the certain column
    DICT_KEYWORDS = {
      'positive' : ['positive'],
      'negative' : ['negative']
    }
    for label, target_keywords in DICT_KEYWORDS.items():
      for keyword in target_keywords:
        if re.search(keyword, text, re.IGNORECASE):
          return LABELS[label]
    return ABSTAIN
    #return False to make empty result instead of ABSTAIN
  ```

4. Click **Predict labels**.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-3ad91437dda15d0bfa035e84c40d972eebb24cf0%2FExtension%20-%20Data%20programming%20-%20highlight%20-%20predicted.png?alt=media" alt=""><figcaption><p>Data programming extension</p></figcaption></figure>
5. You can now see the final answer from the labeling functions for your targeted question and you can just review it.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-66f00f84a2a5ad58a8a82082accf066e6b94914d%2FExtension%20-%20Data%20programming%20-%20project%20-%20IAA%20-%20prelabeled%20columns%20as%20models.png?alt=media" alt=""><figcaption><p>Labeled Final Answer</p></figcaption></figure>
6. You can also see the IAA result in the **Manage functions** dialog. If the score is above 80%, it can be categorized as a good agreement level.

   <figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-ebbe4d3851188509185b68e97d96dcb9e8379dd3%2FExtension%20-%20Data%20programming%20-%20manage%20LF%20-%20IAA.png?alt=media" alt=""><figcaption><p>Inter-annotator agreement tab</p></figcaption></figure>
