Export Project
Last updated
Last updated
The available formats depend on the task type, as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.
Task Type | Export Formats |
Span Labeling | |
Span Labeling with arrows | |
Span Labeling with character-based labeling | |
Row Labeling | |
Document Labeling | |
Audio Labeling | .json_advanced, Datasaur Schema (.json), .tsv |
Span + Document Labeling (Mixed Label Set) | |
Bounding Box Labeling |
Both features are supported through API call. Click here for more detailed explanation.
Click File menu when opening a project.
Select either Export file... or Export all files...
Export file will only export that one file which is currently being opened. The export result will only contain the latest state of the project, not as complete as the one below.
Export all files will export all the files in a project. For projects with multiple assignees, each of their labeled versions will be exported as separate files. The output is in a .zip folder that contains another three folders:
DOCUMENT-Labeler-name is a folder containing the version of the file as labeled by Labeler.
REVIEW is a folder containing the final copy of all labels, including Datasaur auto-accepted labels and Reviewer applied labels.
ROOT is a folder containing only the original raw text, no labels, no edit.
Users can now include unresolved label or answer inside the export result.
It is available for Span, Row, and Document Labeling projects in both Datasaur Schema and Comma-separated values format.
When selecting the supported format, a checkbox option will appear. If it is checked, the result will include the unresolved labels or answers.
Several things are added when you selected include conflicted label or answers:
Comma-separated values (.csv)
New column: Label Status
This column will indicate whether the corresponding line is conflicted or resolved.
New column: Line
This column will indicate the line number.
Datasaur Schema (.json)
New value: rowAnswers
, documentAnswers
, spanLabels
, arrowLabels
The conflicted values will be added to rowAnswers
, documentAnswers
, spanLabels
, or arrowLabels
.
You can differentiate between resolved and unresolved answer by looking at labeledBy
attribute. Unresolved label should have CONFLICT as their labeledBy
value.
The following section will give you some illustration on the result.
Span Labeling
Datasaur Schema (.json)
Conflicted labels will be added to spanLabels
or arrowLabels
.
Comma-separated values (.csv)
This format is similar to the Amazon Comprehend CSV export format, but with an additional column titled "Label Status".
Row and Document Labeling
Datasaur Schema (.json)
Unresolved answer will not be added to the answer set (rowAnswerSets
for Row Labeling, while documentAnswerSets
for Document Labeling)
However, it will be added to rowAnswers
for Row Labeling, while documentAnswers
for Document Labeling, along with the resolved answers.
Comma-separated values (.csv)
Adds "Label Status" and "Line" column.
There may be a case where a single line contains both resolved and unresolved answers due to consensus. In such cases, the answers will be separated into two lines: the first for resolved answers, and the second line for unresolved ones.
When exporting a file, there are multiple options you can choose. Again, all methods are also supported through our API.
The export result will uploaded to Datasaur's bucket and you will download it directly to your device through a link.
Keep in mind that the time needed to generate the link will be directly proportional to the size of the project.
Datasaur will generate a link which will be sent via email (the one that is currently logged in). The link then can be used to download the export result.
Note: the link will expire in 6 hours.
The export result will be sent as a payload of the webhook request.
For full explanation about this method, please refer to this page.
Note: the link will expire in 6 hours.
The export result will be directly uploaded to your bucket based on the External Object Storage that you choose.
You can also add a prefix to the name which will be appended at the start of the export result. Please note that there will be no trailing / before appending the prefix with the name. So, if the prefix
is test and the fileName
is name.json, the export result will be testname.json.
You will be able to export multiple projects of the same project setting by clicking the corresponding checkboxes in the project list.
By clicking the Export button, you can choose the desired project format and the method. The output will be in .zip format.
💡 We recommend you to export up to a maximum of 10 projects at once for performance reasons.