Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
      • Dropbox
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.112.0
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • Use Cases
  • How to Create Pre-Labeled Project
  • Span Labeling
  • Row Labeling
  • Document Labeling
  • Bounding Box Labeling
  1. Data Studio Projects

Pre-Labeled Project

Last updated 9 months ago

Overview

The Pre-labeled Project feature allows users to initiate a new labeling project using a file that already contains pre-defined labels. This capability enables you to jumpstart the labeling process by leveraging existing annotations, simplifying project setup, and eliminating the need to manually add labels from scratch. To use this feature, upload a pre-labeled file alongside the document you want to label.

Use Cases

It is especially useful in the following scenarios:

  • Streamlined Onboarding: When starting a new project that shares the same labeling schema as a previously completed project, you can use a pre-labeled file to quickly set up the new project.

  • Consistency in Labeling: When you have a set of standard labels that should be consistently applied across multiple projects, pre-labeled projects help ensure uniformity.

  • Efficiency: Save time by using pre-defined labels for projects with known labeling requirements.

  • Data Preparation: Import data with preliminary labels from external sources directly into Datasaur projects.

How to Create Pre-Labeled Project

Span Labeling

This project is for labeling specific parts of text within a document. To get started, follow these steps:

  1. Prepare the pre-labeled file. Supported formats for importing pre-labeled span projects include:

    • Span: , , , , , ,

    • Span with arrows: , , ,

    • Span with character based Labeling: , ,

  2. Open Project Creation Wizard to start the project creation.

  3. Upload the pre-labeled file.

  4. Complete the project creation process.

Row Labeling

This project is for assigning pre-labeled answer for row data. To get started, follow these steps:

    • Note: Prepare a column in this file containing the answers to the questions that will be configured in Step 3 of the Project Creation Wizard.

  1. Open Project Creation Wizard to start the project creation.

  2. Upload the pre-labeled file.

  3. In Step 3 of the Project Creation Wizard, provide the questions. Then, link the answers to the questions by using the “Refer answer to table column…” option. Select the column that will serve as the answer to each question.

  4. Complete the project creation process.

Document Labeling

This project is for assigning labels to whole documents based on their content. To get started, follow these steps:

  1. Prepare the media and pre-labeled answer files. As long as the media files are uploaded with the answer files, they will be supported for pre-labeling.

    • Notes:

      • Answer file: This is a JSON file containing the answers to the given question set. The filename should be prefixed with .answer.json. Below is an example of an accepted format for the answer file, given the question set from the example above:

        {
          "caption": "A realistic photograph of a white sedan parked on an asphalt road, facing the camera at a front and slightly right angle. The car is centered slightly to the left, with a visible license plate reading 'HZ20 SBV'. The natural daylight provides bright and clear lighting across the scene. In the midground, the asphalt road extends horizontally, flanked by green grassy areas with scattered bushes. The background features a clear, blue sky and a line of four wind turbines with white blades and pale orange towers positioned along a grassy landscape with a body of water visible behind them. The entire composition centers on the car with the wind turbines providing a modern, eco-friendly backdrop."
        }
      • Media and answer file naming: The media file and its corresponding pre-labeled answer file should have the same name. For example, if the media file is named a.jpg, its answer file should be named a.answer.json.

      • Multiple media/documents: If you have multiple media files or documents to be pre-labeled, prepare your files as follows:

        • a.jpg, a.answer.json

        • b.jpg, b.answer.json

        • c.jpg, c.answer.json

  2. Open Project Creation Wizard to start the project creation.

  3. Upload the pre-labeled answer file with its media file.

  4. In Step 3 of the Project Creation Wizard, provide the questions. Ensure that the answers in the pre-labeled answer file are configured in this step.

  5. Complete the project creation process.

Bounding Box Labeling

This project is for drawing boxes around objects or text in images or documents to identify them. To get started, follow these steps:

  1. Prepare the media and pre-labeled answer files. As long as the media files are uploaded with the answer files, they will be supported for pre-labeling. Notes:

    • Media and answer file naming: The media file and its corresponding pre-labeled answer file should have the same name. For example, if the media file is named a.jpg, its answer file in YOLO format should be named a.txt.

    • Multiple media/documents: If you have multiple media files or documents to be pre-labeled, prepare your files as follows:

      • a.jpg, a.txt

      • b.jpg, b.txt

      • c.jpg, c.txt

  2. Open Project Creation Wizard to start the project creation.

  3. Upload the pre-labeled answer file with its media file.

    • Notes:

      • Labels in your pre-labeled file will automatically match the labels in your label set.

      • Extra labels in the file will lead to the automatic creation of new classes during setup.

  4. In Step 3 of the Project Creation Wizard, provide the labels.

  5. Complete the project creation process.

Prepare the pre-labeled file. Supported formats for importing pre-labeled row projects include: , , , , ,

Answer file format: the supported answer file formats for importing pre-labeled bounding box projects include: , , ,

IOB TSV
TSV non-IOB
JSON Simplified
JSON Advanced
CoNLL-U
CoNLL 2003
Datasaur Schema (.json)
TSV non-IOB
JSON Advanced
CoNLL-U
Datasaur Schema (.json)
JSON Simplified
JSON Advanced
Datasaur Schema (.json)
CSV
JSON Tabular
TSV
XLS and XLSX
JSON Lines
Datasaur Schema (.json)
YOLO (.txt)
LabelMe (.xml)
Pascal VOC (.xml)
Datasaur Schema (.json)