Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • Introduction
  • Key Features
  • Quick Start Guide
  • Supported Model Provider
  • Enforce ML Assisted Labeling Settings from Admin or Reviewer to Labeler
  1. Assisted Labeling

ML Assisted Labeling

ML Assisted Labeling extension enables you to call open source models, LLMs, or your own model to automatically return labels and help you labeling!

Last updated 1 month ago

Introduction

Datasaur's ML Assisted Labeling extension enhances data labeling efficiency and accuracy for NLP projects. It integrates open-source models, Large Language Models (LLMs), and custom models, providing automatic labeling for Span-based, Row-based, Bounding Box and Document-based projects. This tool streamlines the data labeling process, automating your labeling workflow to save time and improve data quality.

Key Features

  1. Labeling multiple labels at once: Datasaur ML Assisted allows you to label multiple items within a label set at once, eliminating the need to label each item individually. Streamline your span-based projects with ease.

  2. Integration with Popular Models: Seamlessly integrates with widely-used models like SpaCy, NER, POS, and Sentiment Analysis. Additionally, it supports integration with various LLMs, Hugging Face, and other model platforms.

  3. Time-Saving Efficiency: Save valuable time and resources by automating the labeling process with Datasaur ML Assisted Labeling. Focus on critical tasks while quickly reviewing the automated labels and making sure you deliver a good quality data.

Quick Start Guide

So how do you set up the ML Assisted Labeling extension? It's as simple as three steps:

  1. Go to Manage Extension and Enable ML Assisted Labeling.

  2. The ML Assisted Labeling Extension should appear on the right side.

  3. Select a service provider for assisted labeling.

  4. Most of the providers don’t need any additional information.

  5. Click Predict labels to apply labels to your document.

For the Row based project type, users have several additional options and features:

  1. Select specific rows for prediction: Choose which rows to include in the prediction.

  2. Target text: Pick which text to use as input for reference.

  3. Target question: Decide which output column to predict.

  4. Faster prediction speed: Toggle this option to run predictions faster via the backend.

Supported Model Provider

Row Labeling
Span Labeling
BBox Labeling
Doc Labeling

We categorize model providers into several key catagories to help better understand their capabilities and use cases.

  • Datasaur Hosted

    • These models are hosted by Datasaur.

    • They are ready-to-use models for common NLP tasks like Named Entity Recognition (NER), Sentiment Analysis, Part-of-Speech (POS) Tagging, and Dependency Parsing.

    • Examples include Spacy, CoreNLP, and FewNERD, which provide pre-trained models for various NLP tasks.

  • Cloud Provider

    • These models require access to external AI services hosted on various cloud providers.

    • Users can leverage pretrained or fine-tuned models deployed on cloud platforms, providing scalability and access to large models.

    • Examples include Hugging Face Inference API, Azure ML, Google Vertex AI, and Amazon SageMaker, where models can be hosted and queried via API.

  • LLM Assisted Labeling

    • These models integrate directly with LLM providers and require the user to input an API key for access.

    • They are typically used for generating labels, suggestions, or annotations to assist with labeling tasks.

    • Examples include OpenAI (GPT models), Azure OpenAI, Anthropic (Claude models), Gemini (Google AI), Cohere, and custom models that can be connected via API.

  • LLM Labs

    • These models are deployed through Datasaur’s LLM Labs, which provides access to 100+ LLM providers through an integrated endpoint.

    • This allows users to switch between different models without manually configuring each provider separately.

  • Custom Model

    • This option allows users to connect their own models via a custom REST API.

    • The API must follow a specific request format to ensure compatibility with Datasaur.

    • It offers flexibility for organizations that have internally trained models or want to use self-hosted LLMs.

Here is the type of model based on the providers:

Datasaur Hosted
Spacy, CoreNLP, SparkNLP, NLTK, Sentiment Analysis, FewNERD

Cloud Provider

Hugging Face, Azure ML, Google Vertex AI, Amazon SageMaker

LLM Assisted Labeling

OpenAI, Azure OpenAI, Anthropic, Gemini, Cohere, Custom

LLM Labs

100+ LLM provider

Custom Model

Depends on users’ internal API

Enforce ML Assisted Labeling Settings from Admin or Reviewer to Labeler

This feature ensures consistency by allowing Admins or Reviewers to enforce their ML Assisted Labeling settings for Labelers. When enabled, Labelers will follow the exact setup specified by the Admin or Reviewer, ensuring uniformity in results.

By clicking the three dots button next to the ML Assisted Labeling Extension, Admins or Reviewers can access the option to "Modify service provider settings." Selecting "All assignees" allows everyone in the project to modify their own ML Assisted Labeling settings. Choosing "Admin or reviewer only" enables the enforcement feature, restricting changes to the Admin or Reviewer.

Once activated, Labelers will not be able to switch to a different service provider. However, Admins or Reviewers retain the ability to modify the settings as needed.

If the "Admin or reviewer only" option is chosen in an ongoing project, please make sure the labeler refreshes their page to sync with the latest settings.

and NER

and POS

Sentiment Analysis
SpaCy
Datasaur LLM Labs
LLM Assisted Labeling
LLM Assisted Labeling
Amazon Comprehend
NLTK
Google Vertex AI
CoreNLP
SparkNLP
Amazon SageMaker
CoreNLP
SparkNLP
Datasaur LLM Labs
Datasaur LLM Labs
Azure
FewNERD
Hugging Face
Hugging Face
Custom Model
Custom Model
Custom Model
Enabling ML Assisted Labeling
ML-assisted Labeling provider in Datasaur
Enabling Admin or Reviewer ML Assisted Labeling Settings to Labeler
Service provider for ML-assisted Labeling
Image of ML Assisted Labeling Menu
Tree-diagram image of ML-assisted Labeling provider in Datasaur
Image of Enabling Admin or Reviewer ML Assisted Labeling Settings to Labeler