Datasaur
Visit our websitePricingBlogPlaygroundAPI Docs
  • Welcome to Datasaur
    • Getting started with Datasaur
  • Data Studio Projects
    • Labeling Task Types
      • Span Based
        • OCR Labeling
        • Audio Project
      • Row Based
      • Document Based
      • Bounding Box
      • Conversational
      • Mixed Labeling
      • Project Templates
        • Test Project
    • Creating a Project
      • Data Formats
      • Data Samples
      • Split Files
      • Consensus
      • Dynamic Review Capabilities
    • Pre-Labeled Project
    • Let's Get Labeling!
      • Span Based
        • Span + Line Labeling
      • Row & Document Based
      • Bounding Box Labeling
      • Conversational Labeling
      • Label Sets / Question Sets
        • Dynamic Question Set
      • Multiple Label Sets
    • Reviewing Projects
      • Review Sampling
    • Adding Documents to an Ongoing Project
    • Export Project
  • LLM Projects
    • LLM Labs Introduction
    • Sandbox
      • Direct Access LLMs
      • File Attachment
      • Conversational Prompt
    • Deployment
      • Deployment API
    • Knowledge base
      • External Object Storage
      • File Properties
    • Models
      • Amazon SageMaker JumpStart
      • Amazon Bedrock
      • Open AI
      • Azure OpenAI
      • Vertex AI
      • Custom model
      • Fine-tuning
      • LLM Comparison Table
    • Evaluation
      • Automated Evaluation
        • Multi-application evaluation
        • Custom metrics
      • Ranking (RLHF)
      • Rating
      • Performance Monitoring
    • Dataset
    • Pricing Plan
  • Workspace Management
    • Workspace
    • Role & Permission
    • Analytics
      • Inter-Annotator Agreement (IAA)
        • Cohen's Kappa Calculation
        • Krippendorff's Alpha Calculation
      • Custom Report Builder
      • Project Report
      • Evaluation Metrics
    • Activity
    • File Transformer
      • Import Transformer
      • Export Transformer
      • Upload File Transformer
      • Running File Transformer
    • Label Management
      • Label Set Management
      • Question Set Management
    • Project Management
      • Self-Assignment
        • Self-Unassign
      • Transfer Assignment Ownership
      • Reset Labeling Work
      • Mark Document as Complete
      • Project Status Workflow
        • Read-only Mode
      • Comment Feature
      • Archive Project
    • Automation
      • Action: Create Projects
  • Assisted Labeling
    • ML Assisted Labeling
      • Amazon Comprehend
      • Amazon SageMaker
      • Azure ML
      • CoreNLP NER
      • CoreNLP POS
      • Custom API
      • FewNERD
      • Google Vertex AI
      • Hugging Face
      • LLM Assisted Labeling
        • Prompt Examples
        • Custom Provider
      • LLM Labs (beta)
      • NLTK
      • Sentiment Analysis
      • spaCy
      • SparkNLP NER
      • SparkNLP POS
    • Data Programming
      • Example of Labeling Functions
      • Labeling Function Analysis
      • Inter-Annotator Agreement for Data Programming
    • Predictive Labeling
  • Assisted Review
    • Label Error Detection
  • Building Your Own Model
    • Datasaur Dinamic
      • Datasaur Dinamic with Hugging Face
      • Datasaur Dinamic with Amazon SageMaker Autopilot
  • Advanced
    • Script-Generated Question
    • Shortcuts
    • Extensions
      • Labels
      • Review
      • Document and Row Labeling
      • Bounding Box Labels
      • List of Files
      • Comments
      • Analytics
      • Dictionary
      • Search
      • Labeling Guidelines
      • Metadata
      • Grammar Checker
      • ML Assisted Labeling
      • Data Programming
      • Datasaur Dinamic
      • Predictive Labeling
      • Label Error Detection
      • LLM Sandbox
    • Tokenizers
  • Integrations
    • External Object Storage
      • AWS S3
        • With IRSA
      • Google Cloud Storage
      • Azure Blob Storage
      • Dropbox
    • SAML
      • Okta
      • Microsoft Entra ID
    • SCIM
      • Okta
      • Microsoft Entra ID
    • Webhook Notifications
      • Webhook Signature
      • Events
      • Custom Headers
    • Robosaur
      • Commands
        • Create Projects
        • Apply Project Tags
        • Export Projects
        • Generate Time Per Task Report
        • Split Document
      • Storage Options
  • API
    • Datasaur APIs
    • Credentials
    • Create Project
      • New mutation (createProject)
      • Python Script Example
    • Adding Documents
    • Labeling
      • Create Label Set
      • Add Label Sets into Existing Project
      • Get List of Label Sets in a Project
      • Add Label Set Item into Project's Label Set
      • Programmatic API Labeling
      • Inserting Span and Arrow Label into Document
    • Export Project
      • Custom Webhook
    • Get Data
      • Get List of Projects
      • Get Document Information
      • Get List of Tags
      • Get Cabinet
      • Export Team Overview
      • Check Job
    • Custom OCR
      • Importable Format
    • Custom ASR
    • Run ML-Assisted Labeling
  • Security and Compliance
    • Security and Compliance
      • 2FA
  • Compatibility & Updates
    • Common Terminology
    • Recommended Machine Specifications
    • Supported Formats
    • Supported Languages
    • Release Notes
      • Version 6
        • 6.112.0
        • 6.111.0
        • 6.110.0
        • 6.109.0
        • 6.108.0
        • 6.107.0
        • 6.106.0
        • 6.105.0
        • 6.104.0
        • 6.103.0
        • 6.102.0
        • 6.101.0
        • 6.100.0
        • 6.99.0
        • 6.98.0
        • 6.97.0
        • 6.96.0
        • 6.95.0
        • 6.94.0
        • 6.93.0
        • 6.92.0
        • 6.91.0
        • 6.90.0
        • 6.89.0
        • 6.88.0
        • 6.87.0
        • 6.86.0
        • 6.85.0
        • 6.84.0
        • 6.83.0
        • 6.82.0
        • 6.81.0
        • 6.80.0
        • 6.79.0
        • 6.78.0
        • 6.77.0
        • 6.76.0
        • 6.75.0
        • 6.74.0
        • 6.73.0
        • 6.72.0
        • 6.71.0
        • 6.70.0
        • 6.69.0
        • 6.68.0
        • 6.67.0
        • 6.66.0
        • 6.65.0
        • 6.64.0
        • 6.63.0
        • 6.62.0
        • 6.61.0
        • 6.60.0
        • 6.59.0
        • 6.58.0
        • 6.57.0
        • 6.56.0
        • 6.55.0
        • 6.54.0
        • 6.53.0
        • 6.52.0
        • 6.51.0
        • 6.50.0
        • 6.49.0
        • 6.48.0
        • 6.47.0
        • 6.46.0
        • 6.45.0
        • 6.44.0
        • 6.43.0
        • 6.42.0
        • 6.41.0
        • 6.40.0
        • 6.39.0
        • 6.38.0
        • 6.37.0
        • 6.36.0
        • 6.35.0
        • 6.34.0
        • 6.33.0
        • 6.32.0
        • 6.31.0
        • 6.30.0
        • 6.29.0
        • 6.28.0
        • 6.27.0
        • 6.26.0
        • 6.25.0
        • 6.24.0
        • 6.23.0
        • 6.22.0
        • 6.21.0
        • 6.20.0
        • 6.19.0
        • 6.18.0
        • 6.17.0
        • 6.16.0
        • 6.15.0
        • 6.14.0
        • 6.13.0
        • 6.12.0
        • 6.11.0
        • 6.10.0
        • 6.9.0
        • 6.8.0
        • 6.7.0
        • 6.6.0
        • 6.5.0
        • 6.4.0
        • 6.3.0
        • 6.2.0
        • 6.1.0
        • 6.0.0
      • Version 5
        • 5.63.0
        • 5.62.0
        • 5.61.0
        • 5.60.0
  • Deployment
    • Self-Hosted
      • AWS Marketplace
        • Data Studio
        • LLM Labs
Powered by GitBook
On this page
  • How to Export the Project
  • Include Unresolved Labels / Answers in the Export Result
  • Export Methods
  • Download
  • Email
  • Webhook
  • External Object Storage
  • Export Multiple Projects from the Project Dashboard
  1. Data Studio Projects

Export Project

Last updated 5 months ago

The available formats depend on the, as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.

Task Type

Export Formats

Span Labeling

Span Labeling with arrows

Span Labeling with character-based labeling

Row Labeling

Document Labeling

Audio Labeling

Span + Document Labeling (Mixed Label Set)

Bounding Box Labeling

How to Export the Project

Both features are supported through API call. Click for more detailed explanation.

  1. Click File menu when opening a project.

  2. Select either Export file... or Export all files...

  • Export file will only export that one file which is currently being opened. The export result will only contain the latest state of the project, not as complete as the one below.

  • Export all files will export all the files in a project. For projects with multiple assignees, each of their labeled versions will be exported as separate files. The output is in a .zip folder that contains another three folders:

    • DOCUMENT-Labeler-name is a folder containing the version of the file as labeled by Labeler.

    • REVIEW is a folder containing the final copy of all labels, including Datasaur auto-accepted labels and Reviewer applied labels.

    • ROOT is a folder containing only the original raw text, no labels, no edit.

Include Unresolved Labels / Answers in the Export Result

Users can now include unresolved label or answer inside the export result.

It is available for Span, Row, and Document Labeling projects in both Datasaur Schema and Comma-separated values format.

Enabling the Option

When selecting the supported format, a checkbox option will appear. If it is checked, the result will include the unresolved labels or answers.

Export Result

Several things are added when you selected include conflicted label or answers:

  • Comma-separated values (.csv)

    • New column: Label Status

      This column will indicate whether the corresponding line is conflicted or resolved.

    • New column: Line

      This column will indicate the line number.

  • Datasaur Schema (.json)

    • New value: rowAnswers, documentAnswers, spanLabels, arrowLabels

      The conflicted values will be added to rowAnswers, documentAnswers, spanLabels, or arrowLabels.

      You can differentiate between resolved and unresolved answer by looking at labeledBy attribute. Unresolved label should have CONFLICT as their labeledBy value.

The following section will give you some illustration on the result.

  • Span Labeling

    • Datasaur Schema (.json)

      Conflicted labels will be added to spanLabels or arrowLabels.

    • Comma-separated values (.csv)

      This format is similar to the Amazon Comprehend CSV export format, but with an additional column titled "Label Status".

  • Row and Document Labeling

    • Datasaur Schema (.json)

      Unresolved answer will not be added to the answer set (rowAnswerSets for Row Labeling, while documentAnswerSets for Document Labeling)

      However, it will be added to rowAnswers for Row Labeling, while documentAnswers for Document Labeling, along with the resolved answers.

    • Comma-separated values (.csv)

      Adds "Label Status" and "Line" column.

      There may be a case where a single line contains both resolved and unresolved answers due to consensus. In such cases, the answers will be separated into two lines: the first for resolved answers, and the second line for unresolved ones.

Export Methods

Download

  • The export result will uploaded to Datasaur's bucket and you will download it directly to your device through a link.

  • Keep in mind that the time needed to generate the link will be directly proportional to the size of the project.

Email

  • Datasaur will generate a link which will be sent via email (the one that is currently logged in). The link then can be used to download the export result.

  • Note: the link will expire in 6 hours.

Webhook

  • The export result will be sent as a payload of the webhook request.

  • Note: the link will expire in 6 hours.

External Object Storage

  • The export result will be directly uploaded to your bucket based on the External Object Storage that you choose.

  • You can also add a prefix to the name which will be appended at the start of the export result. Please note that there will be no trailing / before appending the prefix with the name. So, if the prefix is test and the fileName is name.json, the export result will be testname.json.

Export Multiple Projects from the Project Dashboard

You will be able to export multiple projects of the same project setting by clicking the corresponding checkboxes in the project list.

By clicking the Export button, you can choose the desired project format and the method. The output will be in .zip format.

💡 We recommend you to export up to a maximum of 10 projects at once for performance reasons.

, , , , , ,

, , ,

,

, , , , ,

, , ,

.json_advanced, , .tsv

When exporting a file, there are multiple options you can choose. Again, all methods are also supported through our .

For full explanation about this method, please refer to this .

API
page
Datasaur Schema (.json)
Datasaur Schema (.json)
Datasaur Schema (.json)
task type
here
Datasaur Schema (.json)
Datasaur Schema (.json)
Datasaur Schema (.json)
Datasaur Schema (.json)
JSON Lines (.jsonl)
Datasaur Schema (.json)
.csv
.tsv
.json
.json_advanced
.conll_2003
.tsv_non_iob
.json_advanced
.conllu
.tsv_non_iob
.json_advanced
.csv
.tsv
.xlsx
.json_tabular
.csv
.tsv
.xlsx