Labeling Function Analysis
Allows users to view the results of their labeling functions, including coverage, overlaps, and conflicts, and to improve performance by training the label model
The labeling function analysis is only available after you define labeling functions and generate predicted labels.
Labeling Function Analysis Window
The labeling function analysis window can be accessed in two ways: by clicking the Labeling functions button and navigating to the respective tab, or by clicking the See labeling function analysis button after predicting the labels.

If you haven't predicted the labels, the labeling function analysis page will show an empty state.

After predicting labels, the analysis will be shown. There are three metrics: coverage, overlaps, and conflicts.

Coverage is the fraction of the dataset each labeling function labels.
Overlaps are the fraction of the dataset where each labeling function and at least another labeling function label.
Conflicts are the fraction of the dataset where each labeling function and at least another labeling function label, and they disagree.
If you have a new labeling function or have made changes to your labeling function, you need to re-predict the labels in order to update the analysis of the labeling function.

How to Improve Labeling Function Performance
The ideal situation for labeling function is to have high coverage, high overlap, and low conflicts. Below is a use case of labeling function performance conditions:
Fairly high coverage, high overlaps, and high conflicts.
It means our LFs can label a lot of data points and the majority of data points were assigned more than one LFs with different labels. We have one example of performance metrics value below.
Coverage = 50%
Overlaps = 30%
Conflicts = 27%
This number shows that even though there is large coverage and overlaps, the disagreement between labeling functions happens in almost half of the coverage. To improve this, we need to train the label model to get the performance value between labeling functions. The performance value of labeling functions could estimate accuracies and correlations between labeling functions since we know some labeling functions could give high or low signals regarding the label.
Low coverage, high overlaps, and high conflicts.
We need to add several new labeling functions and try to identify which labeling function creates more conflicts by experimenting one by one. After identifying it, we can re-evaluate the labeling functions.
Last updated