# ML-Assisted Labeling

## Introduction

ML-assisted labeling helps automate data labeling for NLP projects. It supports span, row, bounding box, and document labeling by using open-source models, large language models (LLMs), and custom models. This reduces manual effort and improves labeling speed and consistency.

<figure><img src="/files/XOOb1YuaxLp89HKDxsfF" alt="Service provider for ML-assisted Labeling"><figcaption></figcaption></figure>

## Key features

1. **Batch labeling:** Label multiple items at once, eliminating the need to label each item individually.
2. **Model integrations**: Works with models for tasks such as named entity recognition (NER), part-of-speech (POS) tagging, and sentiment analysis, as well as LLMs and external providers.
3. **Automation:** Generate labels automatically and review them to ensure quality.

## Quick start guide

To enable the **ML-assisted labeling** extension:

1. Go to the **Manage extensions** dialog and enable the **ML-assisted labeling** extension.

   <figure><img src="/files/sML60IzBG0yL64WuhaZ1" alt="Image of ML Assisted Labeling Menu"><figcaption><p>ML-assisted labeling extension with spaCy</p></figcaption></figure>
2. Select a service provider.
3. Click **Predict labels** to generate labels.

{% hint style="info" %}
For row labeling projects, there are some additional steps:

1. **Select rows**: Choose which rows to include in prediction.
2. **Target text**: Select input column used as context.
3. **Target question**: Select output field to predict.
4. **Faster prediction speed**: Run predictions via the backend.
   {% endhint %}

## Supported model providers

<table><thead><tr><th width="198">Row labeling</th><th width="211">Span labeling</th><th width="162">Bounding box labeling</th><th>Document labeling</th></tr></thead><tbody><tr><td><a href="/pages/dkML1ZdqssIofZToDj5S">Sentiment Analysis</a></td><td><a href="/pages/BrBWWbOHGFcNt9hfNMsK">SpaCy</a></td><td><a href="/pages/-MeZJGoKXi7YCcpatk_V#custom-api-for-bounding-box-labeling">Custom Model</a></td><td><a href="/pages/mlRgrTnnUPGHy3cYrhih">Datasaur LLM Labs</a></td></tr><tr><td><a href="/pages/yZUvKXEeM76sPwrVexDC">LLM Assisted Labeling</a></td><td><a href="/pages/yZUvKXEeM76sPwrVexDC">LLM Assisted Labeling</a></td><td></td><td></td></tr><tr><td><a href="/pages/2k1y4pzryCiWqXSVTLZa">Amazon Comprehend</a></td><td><a href="/pages/TgBLB4kGTpd0ZIsniq6K">NLTK</a></td><td></td><td></td></tr><tr><td><a href="/pages/50Dl8QuZ6QKStHKiPV4Q">Google Vertex AI</a></td><td><a href="/pages/TWiOjQaBjvdWcby3IDId">CoreNLP</a> and <a href="/pages/U3pzcETUXue0xHfNvsY1">SparkNLP</a> NER</td><td></td><td></td></tr><tr><td><a href="/pages/K4osyaI7xXEDqWDHxGOF">Amazon SageMaker</a></td><td><a href="/pages/TfNBnDsNCYMOZLWmYWsD">CoreNLP</a> and <a href="/pages/tAP3mYMszNDCMr8FsVBp">SparkNLP</a> POS</td><td></td><td></td></tr><tr><td><a href="/pages/mlRgrTnnUPGHy3cYrhih">Datasaur LLM Labs</a></td><td><a href="/pages/mlRgrTnnUPGHy3cYrhih">Datasaur LLM Labs</a></td><td></td><td></td></tr><tr><td><a href="/pages/AO9Zrn7dnYyFRQm3QHVE">Azure</a></td><td><a href="/pages/veRx8KlGRhIdPsV1Jwf3">FewNERD</a></td><td></td><td></td></tr><tr><td><a href="/pages/-Me_8WrEuconSYRE8rtv">Hugging Face</a></td><td><a href="/pages/-Me_8WrEuconSYRE8rtv">Hugging Face</a></td><td></td><td></td></tr><tr><td><a href="/pages/-MeZJGoKXi7YCcpatk_V#custom-api-for-row-based">Custom Model</a></td><td><a href="/pages/-MeZJGoKXi7YCcpatk_V#custom-api-for-span-based">Custom Model</a></td><td></td><td></td></tr></tbody></table>

Model providers are grouped into the following categories:

* **Datasaur hosted**
  * Prebuilt models hosted by Datasaur for common NLP tasks such as NER, sentiment analysis, POS tagging, and dependency parsing.
  * Examples: spaCy, CoreNLP, and FewNERD.
* **Cloud providers**
  * Models hosted on external platforms. You can use pre-trained or fine-tuned models via API.
  * Examples: Hugging Face Inference API, Azure ML, Google Vertex AI, and Amazon SageMaker.
* **LLM Assisted Labeling**
  * Models from LLM providers that require an API key.
  * Examples: OpenAI (GPT models), Azure OpenAI, Anthropic (Claude models), Gemini (Google AI), Cohere, and other custom models that can be connected via API.
* **LLM Labs**
  * Models deployed through Datasaur LLM Labs, providing access to multiple providers through a single endpoint.
  * This allows users to switch between different models without manually configuring each provider separately.
* **Custom models**
  * Connect your own model using a custom REST API. The API must follow the required request format.
  * It provides flexibility for organizations with internally trained models or self-hosted LLMs.

<figure><img src="/files/OL15JG9ESYofpuaORQiU" alt="Tree-diagram image of ML-assisted Labeling provider in Datasaur"><figcaption><p>ML-assisted labeling providers in Datasaur</p></figcaption></figure>

<table><thead><tr><th width="201.75390625">Type</th><th>Examples</th></tr></thead><tbody><tr><td>Datasaur hosted</td><td>spaCy, CoreNLP, SparkNLP, NLTK, Sentiment Analysis, FewNERD</td></tr><tr><td>Cloud provider</td><td>Hugging Face, Azure ML, Google Vertex AI, Amazon SageMaker</td></tr><tr><td>LLM Assisted Labeling</td><td>OpenAI, Azure OpenAI, Anthropic, Gemini, Cohere, custom</td></tr><tr><td>LLM Labs</td><td>100+ LLM providers</td></tr><tr><td>Custom</td><td>Depends on your internal API</td></tr></tbody></table>

## Restrict ML-assisted labeling settings

Admins or reviewers can restrict ML-assisted labeling settings to ensure consistent configuration across labelers. When enabled, labelers use the configuration set by the admin or reviewer, ensuring consistent results.

### Steps

1. Click the three-dot menu next to the **ML-assisted labeling** header.
2. In **Modify service provider setting**, choose one of the following options:
   1. **All assignees**: Allows all labelers to modify their own settings.
   2. **Admin or reviewer only**: Restricts changes to admins or reviewers.

<figure><img src="/files/5iAT1g7eJ4xJlDRgEh9C" alt="Image of Enabling Admin or Reviewer ML Assisted Labeling Settings to Labeler"><figcaption></figcaption></figure>

### Behaviors

* When **Admin or reviewer only** is selected, labelers cannot change the service provider or settings. Admins and reviewers can still update the configuration.
* In ongoing projects, labelers must refresh the page to apply updated settings.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/assisted-labeling/ml-assisted-labeling.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
