Links

Data Samples

On this page we will provide you with sample datasets so you can immediately create a project and testing the labeling interface. As we mentioned, Datasaur has many different project types. We will provide you with sample datasets for the following projects: span-based, span-based with arrows, row-based, bounding-box, and document-based. If you would like to create an Audio or LLM project type, select their respective links. Both of these pages contain sample data for you to upload.

Span-based

The following zip files include the sample dataset and label sets. The .txt files in this zip folder contain the dataset to be labeled. Upload the .txt files in Step 1 of the project creation wizard (PCW). The .csv is the labelset (taxonomy) to be applied to the dataset. Upload the .csv in Step 3 of PCW.
Datasaur sample - Token (NER).zip
5KB
Binary
NER samples
Datasaur sample - Token (POS).zip
4KB
Binary
POS samples
Datasaur sample - Token (OCR).zip
973KB
Binary
OCR samples

Span-based with arrows

The following zip files include the dataset and sample label sets. The .tsv files in this zip folder contain the dataset to be labeled. Upload the .tsv files in Step 1 of the project creation wizard (PCW). The .csv is the labelset (taxonomy) to be applied to the dataset. Upload the .csv in Step 3 of PCW.
Datasaur sample - Token (DEP).zip
194KB
Binary
Dependency samples
Datasaur sample - Token (COR).zip
832B
Binary
Coreference samples
Datasaur sample - Token (with arrow).zip
940B
Binary
Relation samples

Row-based (textual classification)

Upload this .csv in Step 1 of PCW. In Step 3 you will be able to make your question set either through UI or by uploading a .csv. In this example, we are doing a sentiment analysis.
Sentiment Analysis Sample - Sheet1.csv
817B
Binary
Multiple files samples

Document-based (document/image classification)

The following zip files include images and PDFs for you to create a document-based project. Make sure to chose one file type when uploading the dataset in Step 1 of PCW. You can make your question set either in the UI or by uploading a .csv.
imagesamplefiles.zip
1MB
Binary
Image sample files
pdfsamplefiles.zip
3MB
Binary
PDF samples

Bounding-box based

If you would like to create a Bounding-box project, you can use the datasets below. We have included PDFs and .jpg images; please upload one file type in Step 1 of PCW for your project. Once you get to Step 3 of PCW you can upload your labelset (taxonomy) by .csv or by creating them in the UI.
Datasaur sample - Bbox (image).zip
2MB
Binary
Datasaur sample - Bbox (pdf).zip
3MB
Binary
Last modified 3mo ago