# Labeling Function Analysis

The labeling function analysis is only available after you define labeling functions and generate predicted labels.

### Labeling Function Analysis Window

The labeling function analysis window can be accessed in two ways: by clicking the **Labeling functions** button and navigating to the respective tab, or by clicking the **See labeling function analysis** button after predicting the labels.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-3ad91437dda15d0bfa035e84c40d972eebb24cf0%2FExtension%20-%20Data%20programming%20-%20highlight%20-%20predicted.png?alt=media" alt=""><figcaption><p>Data programming extension</p></figcaption></figure>

If you haven't predicted the labels, the labeling function analysis page will show an empty state.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-c71975e4207a6c48913a2e29ad7cad4fefc2406b%2FExtension%20-%20Data%20programming%20-%20manage%20LF%20-%20LF%20analysis%20-%20empty.png?alt=media" alt=""><figcaption><p>Empty labeling function analysis</p></figcaption></figure>

After predicting labels, the analysis will be shown. There are three metrics: coverage, overlaps, and conflicts.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-08385b942f402550a6fa91baf11536f151977fde%2FExtension%20-%20Data%20programming%20-%20manage%20LF%20-%20LF%20analysis%20-%20result.png?alt=media" alt=""><figcaption><p>Labeling function analysis</p></figcaption></figure>

1. Coverage is the fraction of the dataset each labeling function labels.
2. Overlaps are the fraction of the dataset where each labeling function and at least another labeling function label.
3. Conflicts are the fraction of the dataset where each labeling function and at least another labeling function label, and they disagree.

If you have a new labeling function or have made changes to your labeling function, you need to re-predict the labels in order to update the analysis of the labeling function.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-1081c371d8cccb64d25b9ac2f3f4768d94887554%2FExtension%20-%20Data%20programming%20-%20manage%20LF%20-%20LF%20analysis%20-%20warning.png?alt=media" alt=""><figcaption><p>Outdated labeling function analysis</p></figcaption></figure>

### How to Improve Labeling Function Performance

The ideal situation for labeling function is to have high coverage, high overlap, and low conflicts. Below is a use case of labeling function performance conditions:

#### Fairly high coverage, high overlaps, and high conflicts.

It means our LFs can label a lot of data points and the majority of data points were assigned more than one LFs with different labels. We have one example of performance metrics value below.

1. Coverage = 50%
2. Overlaps = 30%
3. Conflicts = 27%

This number shows that even though there is large coverage and overlaps, the disagreement between labeling functions happens in almost half of the coverage. To improve this, we need to train the label model to get the performance value between labeling functions. The performance value of labeling functions could estimate accuracies and correlations between labeling functions since we know some labeling functions could give high or low signals regarding the label.

#### Low coverage, high overlaps, and high conflicts.

We need to add several new labeling functions and try to identify which labeling function creates more conflicts by experimenting one by one. After identifying it, we can re-evaluate the labeling functions.
