# Data Formats

Datasaur supports a wide range of data import formats. The available formats depend on the [task type](https://datasaurai.gitbook.io/datasaur/overview/task-type), as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.

{% hint style="info" %}
If you don’t see your preferred file format below, you can use [file transformers](https://datasaurai.gitbook.io/datasaur/workspace-management/file-transformer/upload-file-transformer) to upload a custom format.
{% endhint %}

## Available formats

| [**Span-based**](/data-studio-projects/nlp-task-types/span-based.md)                               | [.txt](/compatibility-and-updates/supported-formats.md#txt), [.tsv](/compatibility-and-updates/supported-formats.md#iob-specialized-tsv), [.json](/compatibility-and-updates/supported-formats.md#json)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| -------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [**Span-based with arrows**](/data-studio-projects/lets-get-labeling/span-based.md#draw-arrows)    | [.txt](/compatibility-and-updates/supported-formats.md#txt), [.tsv](/compatibility-and-updates/supported-formats.md#iob-specialized-tsv), [.tsv-non-iob](/compatibility-and-updates/supported-formats.md#tsv_non_iob), [.json](/compatibility-and-updates/supported-formats.md#json), [.conllu](/compatibility-and-updates/supported-formats.md#conll-u)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| [**Span-based with audio**](/data-studio-projects/nlp-task-types/span-based/audio-project.md)      | <p><strong>Media:</strong> <a href="/pages/hnBPIPVkJZS5fLhyUjKO#mp3">.mp3</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#m4a">.m4a</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#aac">.aac</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#flac">.flac</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#wav">.wav</a><br><strong>Transcription:</strong> <a href="/pages/hnBPIPVkJZS5fLhyUjKO#srt">.srt</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#txt">.txt</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#vtt">.vtt</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#json">.json</a></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| [**Span-based with document**](/data-studio-projects/nlp-task-types/document-based.md)             | <p><strong>Media:</strong> <a href="/pages/hnBPIPVkJZS5fLhyUjKO#bmp">.bmp</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#docx-doc">.doc</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#docx-doc">.docx</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#pdf">.pdf</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#pptx-ppt">.ppt</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#pptx-ppt">.pptx</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#jpeg-and-jpg">.jpeg</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#jpeg-and-jpg">.jpg</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#png">.png</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#tiff-and-tif">.tiff</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#tiff-and-tif">.tif</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#webp">.webp</a></p><p><strong>Transcription:</strong> <a href="/pages/hnBPIPVkJZS5fLhyUjKO#json">.json</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#txt">.txt</a>, <a href="/pages/hnBPIPVkJZS5fLhyUjKO#iob-specialized-tsv">.tsv</a></p>                                                                                                                                                                                                                                                                                                                                                                                           |
| [**Bounding box labeling**](/data-studio-projects/lets-get-labeling/bounding-box-labeling.md)      | [.bmp](/compatibility-and-updates/supported-formats.md#bmp), [.gif](https://datasaurai.gitbook.io/datasaur/supported-formats#gif), [.jpeg](/compatibility-and-updates/supported-formats.md#jpeg-and-jpg), [.jpg](/compatibility-and-updates/supported-formats.md#jpeg-and-jpg), [.pdf](/compatibility-and-updates/supported-formats.md#pdf), [.png](/compatibility-and-updates/supported-formats.md#png), [.svg](/compatibility-and-updates/supported-formats.md#svg), [.tiff](/compatibility-and-updates/supported-formats.md#tiff-and-tif), [.tif](/compatibility-and-updates/supported-formats.md#tiff-and-tif), [.webp](/compatibility-and-updates/supported-formats.md#webp)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| [**Conversational labeling**](/data-studio-projects/nlp-task-types/conversational.md)              | [.txt](/compatibility-and-updates/supported-formats.md#txt), [.json](/compatibility-and-updates/supported-formats.md#json)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| [**Row-based (text classification)**](/data-studio-projects/nlp-task-types/row-based.md)           | [.csv](/compatibility-and-updates/supported-formats.md#csv), [.json](/compatibility-and-updates/supported-formats.md#json), [.jsonl](/compatibility-and-updates/supported-formats.md#jsonl-json-lines), [.tsv](/compatibility-and-updates/supported-formats.md#tsv), [.txt](/compatibility-and-updates/supported-formats.md#txt), [.xls](/compatibility-and-updates/supported-formats.md#xls-and-xlsx), [.xlsx](/compatibility-and-updates/supported-formats.md#xls-and-xlsx)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| [**Document-based\***](/advanced/extensions/document-and-row-labeling.md)                          | [.bmp](/compatibility-and-updates/supported-formats.md#bmp), [.csv](/compatibility-and-updates/supported-formats.md#csv), [.gif](/compatibility-and-updates/supported-formats.md#gif), [.html](/compatibility-and-updates/supported-formats.md#html), [.jpeg](/compatibility-and-updates/supported-formats.md#jpeg-and-jpg), [.jpg](/compatibility-and-updates/supported-formats.md#jpeg-and-jpg), [.json](/compatibility-and-updates/supported-formats.md#json), [.md](/compatibility-and-updates/supported-formats.md#md-markdown), [.mp4](/compatibility-and-updates/supported-formats.md#mp4), [.pdf](/compatibility-and-updates/supported-formats.md#pdf), [.png](/compatibility-and-updates/supported-formats.md#png), [.svg](/compatibility-and-updates/supported-formats.md#svg), [.tiff](/compatibility-and-updates/supported-formats.md#tiff-and-tif), [.tif](/compatibility-and-updates/supported-formats.md#tiff-and-tif), [.txt](/compatibility-and-updates/supported-formats.md#txt), [.tsv](/compatibility-and-updates/supported-formats.md#iob-specialized-tsv), [.uri](/compatibility-and-updates/supported-formats.md#url-urls-uri), [.url](/compatibility-and-updates/supported-formats.md#url-urls-uri), [.urls](/compatibility-and-updates/supported-formats.md#url), [.webp](/compatibility-and-updates/supported-formats.md#webp) |
| [**LLM Evaluation (fine tuning)**](https://datasaurai.gitbook.io/datasaur/llm-projects/evaluation) | [.csv](https://datasaurai.gitbook.io/datasaur/getting-started/lets-get-labeling/llm-project-type#creating-an-llm-evaluation-project-in-datasaur-a-4-step-guide)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| [**LLM Ranking (RLHF)**](https://datasaurai.gitbook.io/datasaur/llm-projects/ranking-rlhf)         | [.csv](https://datasaurai.gitbook.io/datasaur/getting-started/lets-get-labeling/llm-project-type#creating-an-llm-ranking-project-in-datasaur-a-4-step-guide)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |

## Size limit

* Text-based file: 50 MB per file
  * Example: [.txt](/compatibility-and-updates/supported-formats.md#txt), [.tsv](/compatibility-and-updates/supported-formats.md#iob-specialized-tsv), [.json](/compatibility-and-updates/supported-formats.md#json), and [.csv](/compatibility-and-updates/supported-formats.md#csv)
* Multimedia & image file: 500 MB per file
  * Example: [Video](/compatibility-and-updates/supported-formats.md#mp4), [Image](/compatibility-and-updates/supported-formats.md#jpeg-and-jpg), [Audio](/compatibility-and-updates/supported-formats.md#mp3), and [PDF](/compatibility-and-updates/supported-formats.md#pdf)
* Project size: 1.5 GB

{% hint style="info" %}
To create projects with larger files, use [Robosaur](/integrations/robosaur.md). For assistance, contact us at **<support@datasaur.ai>.**
{% endhint %}

## Important notes

When uploading documents for OCR labeling and audio labeling, make sure each image file and its corresponding transcription file have the same name. For example, `unicef.jpg` and `unicef.txt`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/data-studio-projects/creating-a-project/import-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
