Export Project

The available formats depend on the task type, as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.

Task Type

Export Formats

Span Labeling

Span Labeling with arrows

Span Labeling with character-based labeling

Row Labeling

Document Labeling

Audio Labeling

.json_advanced, Datasaur Schema (.json), .tsv

Span + Document Labeling (Mixed Label Set)

Bounding Box Labeling

How to Export the Project

Both features are supported through API call. Click here for more detailed explanation.

  1. Click File menu when opening a project.

  2. Select either Export file... or Export all files...

  • Export file will only export that one file which is currently being opened. The export result will only contain the latest state of the project, not as complete as the one below.

  • Export all files will export all the files in a project. For projects with multiple assignees, each of their labeled versions will be exported as separate files. The output is in a .zip folder that contains another three folders:

    • DOCUMENT-Labeler-name is a folder containing the version of the file as labeled by Labeler.

    • REVIEW is a folder containing the final copy of all labels, including Datasaur auto-accepted labels and Reviewer applied labels.

    • ROOT is a folder containing only the original raw text, no labels, no edit.

Include Unresolved Labels / Answers in the Export Result

Users can now include unresolved label or answer inside the export result.

It is available for Span, Row, and Document Labeling projects in both Datasaur Schema and Comma-separated values format.

Enabling the Option

When selecting the supported format, a checkbox option will appear. If it is checked, the result will include the unresolved labels or answers.

Export Result

Several things are added when you selected include conflicted label or answers:

  • Comma-separated values (.csv)

    • New column: Label Status

      This column will indicate whether the corresponding line is conflicted or resolved.

    • New column: Line

      This column will indicate the line number.

  • Datasaur Schema (.json)

    • New value: rowAnswers, documentAnswers, spanLabels, arrowLabels

      The conflicted values will be added to rowAnswers, documentAnswers, spanLabels, or arrowLabels.

      You can differentiate between resolved and unresolved answer by looking at labeledBy attribute. Unresolved label should have CONFLICT as their labeledBy value.

The following section will give you some illustration on the result.

  • Span Labeling

    • Datasaur Schema (.json)

      Conflicted labels will be added to spanLabels or arrowLabels.

    • Comma-separated values (.csv)

      This format is similar to the Amazon Comprehend CSV export format, but with an additional column titled "Label Status".

  • Row and Document Labeling

    • Datasaur Schema (.json)

      Unresolved answer will not be added to the answer set (rowAnswerSets for Row Labeling, while documentAnswerSets for Document Labeling)

      However, it will be added to rowAnswers for Row Labeling, while documentAnswers for Document Labeling, along with the resolved answers.

    • Comma-separated values (.csv)

      Adds "Label Status" and "Line" column.

      There may be a case where a single line contains both resolved and unresolved answers due to consensus. In such cases, the answers will be separated into two lines: the first for resolved answers, and the second line for unresolved ones.

Export Methods

When exporting a file, there are multiple options you can choose. Again, all methods are also supported through our API.

Download

  • The export result will uploaded to Datasaur's bucket and you will download it directly to your device through a link.

  • Keep in mind that the time needed to generate the link will be directly proportional to the size of the project.

Email

  • Datasaur will generate a link which will be sent via email (the one that is currently logged in). The link then can be used to download the export result.

  • Note: the link will expire in 6 hours.

Webhook

  • The export result will be sent as a payload of the webhook request.

  • For full explanation about this method, please refer to this page.

  • Note: the link will expire in 6 hours.

External Object Storage

  • The export result will be directly uploaded to your bucket based on the External Object Storage that you choose.

  • You can also add a prefix to the name which will be appended at the start of the export result. Please note that there will be no trailing / before appending the prefix with the name. So, if the prefix is test and the fileName is name.json, the export result will be testname.json.

Export Multiple Projects from the Project Dashboard

You will be able to export multiple projects of the same project setting by clicking the corresponding checkboxes in the project list.

By clicking the Export button, you can choose the desired project format and the method. The output will be in .zip format.

💡 We recommend you to export up to a maximum of 10 projects at once for performance reasons.

Last updated