Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
      • Dropbox
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • Introduction
  • Key Features
  • Terminologies
  • Quick Start Guide
  • Step 1: Enable the Feature
  • Step 2: Configure Settings
  • Step 3: Run Detection
  • Step 4: Review and Adjust
  • Troubleshooting Common Issues
  1. Assisted Review

Label Error Detection

Enhance Your Dataset Quality with Datasaur's Label Error Detection

Last updated 1 month ago

Introduction

Datasaur's Label Error Detection feature revolutionizes the way you approach data labeling by automating the identification of inaccuracies in your dataset. It flags potential label errors and suggests alternatives, significantly improving data integrity and model performance.

Key Features

Manual data review is notoriously time-consuming and prone to human error. Leveraging automated error detection:

  1. Boosts Efficiency: Seamlessly pinpoint label errors without sifting through your entire dataset.

  2. Enhances Focus: Apply a specific error threshold to zero in on the most questionable labels, optimizing your review process.

  3. Improves Accuracy: Receive intelligent label suggestions, enabling quick, precise corrections.

Terminologies

  1. Label Errors: Discrepancies or inaccuracies in the assigned labels of your dataset, which can mislead your model's learning process.

  2. Error Possibility: A probability score that indicates the likelihood of a label being incorrect, allowing for prioritized review.

  3. Label Correction Suggestions: Offering alternative labels suggested by the model for accurate and refined dataset curation.

Quick Start Guide

Unlock the full potential of label error detection in your row-based project with four simple steps:

Step 1: Enable the Feature

Step 2: Configure Settings

In the Label Error Detection extensions (on the right side), configure:

  1. The Target Column for input text.

  2. The Target Question for labels.

Step 3: Run Detection

Ensure all rows are labeled before initiating the 'Find Label Errors' process. This might take a moment.

Detected errors in rows will appear in the Label Errors section. Adjust the error possibility threshold to fine-tune your review focus.

Step 4: Review and Adjust

Click the problematic rows from Label Errors section and make necessary corrections by editing labels directly or accepting suggested changes.

Troubleshooting Common Issues

  • Feature Not Working: Ensure all rows are labeled, and the extension is properly enabled.

  • Incorrect Suggestions: Adjust the error possibility threshold for more accurate detection.

Happy labeling!

Activate the Label Error Detection extension from the section in the right sidebar of your project dashboard.

Manage Extensions
Activating label error detection.
Setting up your detection parameters.
Initiating error detection.
Identifying errors in your dataset.
Focusing on label errors for correction.