# Import Transformer

With the **import** transformer, you can import almost anything into Datasaur. Currently, we only accept `.csv`, `.txt`, and `.json` files.

Your new import transformer will have this template:

```typescript
/**
 * This function should be written as this template and correctly implements ImportFunction interface.
 */
(fileContent: string): SimpleDocument => {
  /// Implement import function here
  return {
    cells: [],
    labels: [],
  };
};
```

The import transformer is a function that takes the `fileContent` as a string, parsed using **UTF-8 encoding**, and returns a `SimpleDocument` that Datasaur can process.

`SimpleDocument` is an object that represents a document in Datasaur. It is a combined type that supports span labeling and row labeling. The structure of `SimpleDocument` is shown below:

* **cells:** An array of cells. Datasaur documents are stored in a tabular structure. Each cell represents a single table cell. For span-based projects, only a single-column table is currently supported. Each row in the document must have the same number of columns.
  * **line:** A zero-based number indicating the row.
  * **index:** A zero-based number indicating the column. For span-based projects, this value can only be set to `0`.
  * **content**: The original content of a cell.
  * **tokens**: A tokenized version of the content. This field is only used for token-based projects only.
  * **metadata**: An optional array of key-value data to be stored per cell. You can find the structure and configuration options for metadata [here](/advanced/extensions/metadata.md).
    * **key**: Identifier for the metadata item, represented as a string. Example: `author`.
    * **value**: Content or data of the metadata item, represented as a string. Example: `John Doe`.
    * **type**: Optional field indicating the type of the value in [MIME type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types).
      * **Default**: text/plain.
      * **Supported type**:
        * **text/plain:** Displays metadata as plain text.
        * **text/html**: Displays metadata as HTML.
        * **image/\*:** Displays metadata as an image. [Supported image formats](https://en.wikipedia.org/wiki/Comparison_of_browser_engines_\(graphics_support\)) depend on your browser.
        * **audio/\*:** Displays metadata as an audio player. [Supported audio formats](https://en.wikipedia.org/wiki/HTML5_audio#Supported_audio_coding_formats) depend on your browser.
    * **pinned**: Boolean that specifies whether metadata is shown at the top of each cell. Metadata that isn’t pinned remains available in the **Metadata** extension.
    * **config**: Customizes appearance for text/plain types.
      * **color:** Determine the text color of the metadata as a string. Accepts any HTML color codes and names.
      * **backgroundColor:** Determine the background color of the metadata as a string. Accepts any HTML color codes and names.
      * **borderColor:** Determine the border color of the metadata as a string. Accepts any HTML color codes and names.
* **labels:** An array of labels.
  * Common fields:
    * **id:** A unique number to identify the label, to be referred to by the arrow labels.
    * **startCellLine**: Starting line position.
    * **startCellIndex**: Starting line column position.
    * **startTokenIndex**: Starting token index position relative to cell.
    * **startCharIndex**: Starting character index position relative to the token.
    * **endCellLine**: Ending line sentence position.
    * **endCellIndex**: Ending line column position.
    * **endTokenIndex**: Ending token index position relative to cell.
    * **endCharIndex**: Ending character index position relative to the token.
    * **type:** Type of the labels. Must be one of: `"SPAN"`, `"ARROW"`, `"BOUNDING_BOX"`, `"TIMESTAMP"`.
  * Specific fields by its type:
    * `"SPAN"` or `"ARROW"`
      * **labelSetIndex**: Replaces **layer**. Configures how the label set items are grouped.
      * **labelName**: Replaces **labelSetItemId**. The text provided here will be displayed in web UI.
    * `"ARROW"`
      * **originId:** ID of a span label as the arrow's origin.
      * **destinationId:** ID of a span label as the arrow's destination.
    * `"BOUNDING_BOX"`
      * **pageIndex**: Page information for multiple page files, such as `.pdf` and `.tiff`. Set field to **0** for common image formats, such as `.jpg`, `.png`, `.bmp`, etc.
      * **nodeCount**: Number of nodes; this is used for future support for polygons. Only supports **4** nodes in a rectangular shape for now.
      * **x0:** The first node's x value in the screen coordinate system.
      * **y0:** The first node's y value in the screen coordinate system.
      * **x1:** The second node's x value in the screen coordinate system.
      * **y1:** The second node's y value in the screen coordinate system.
      * **x2:** The third node's x value in the screen coordinate system.
      * **y2:** The third node's y value in the screen coordinate system.
      * **x3:** The fourth node's x value in the screen coordinate system.
      * **y3:** The fourth node's y value in the screen coordinate system.
    * `"TIMESTAMP"`
      * **startTimestampMillis:** The starting timestamp in milliseconds.
      * **endTimestampMillis:** The ending timestamp in milliseconds.

## Sample case

This example shows how to label a `.srt` subtitle file and display timestamps as metadata. The file transformer script is shown below.

```typescript
/**
 * This function should be written as this template and correctly implements ImportFunction interface.
 */
(fileContent: string): SimpleDocument => {
    /// Implement import function here
    const lines = fileContent.split('\r\n\r\n');
    let currLine: number = 0;
    const cells: Cell[] = [];
    lines.forEach((line) => {
      const [, timestamp, ...subtitles] = line.split('\r\n');
      subtitles.forEach((subtitle) => {
        cells.push({
          index: 0,
          line: currLine,
          content: subtitle,
          tokens: subtitle.split(' '),
          metadata: [
            {key: "timestamp", value: timestamp, pinned: true, config: { color: "#3399cc", backgroundColor: "", borderColor: "#cc3399"}}
          ]
        });
        currLine += 1;
      });
    });

    const labels: SpanAndArrowLabel[] = [];
    let labelId = 0;

    // Label the first two tokens on the second line as "Example label"
    const secondTokenOnSecondLine = cells[1].tokens[1];
    labels.push({
      id: ++labelId,
      type: "SPAN",
      startCellLine: 1,
      startCellIndex: 0,
      startTokenIndex: 0,
      startCharIndex: 0,
      endCellLine: 1,
      endCellIndex: 0,
      endTokenIndex: 1,
      endCharIndex: secondTokenOnSecondLine.length - 1,
      labelSetIndex: 0,
      labelName: "Example label"
    })

    // Label each occurence of "Sherlock" as "Person's name".
    const sherlock = "sherlock";
    cells.forEach(cell => {
      cell.tokens.forEach((token, tokenIndex) => {
        if (token.toLowerCase() === sherlock) {
          labels.push({
            id: ++labelId, 
            type: "SPAN",
            startCellLine: cell.line,
            startCellIndex: cell.index,
            startTokenIndex: tokenIndex,
            startCharIndex: 0,
            endCellLine: cell.line,
            endCellIndex: cell.index,
            endTokenIndex: tokenIndex,
            endCharIndex: token.length - 1,
            labelSetIndex: 0,
            labelName: "Person's name",
          })
        }
      })
    })

    return {
      cells,
      labels,
    };
  };
```

To get started:

1. Rename your file by adding the `.txt` extension. You can use the sample file below.
2. Click **Create file transformer.**
3. Enter a name, select **Import**, then click **Create**.
4. Paste the file transformer script to the editor. You can also upload it.

   <figure><img src="/files/S4l2AytQ1mX5BXgiuD4j" alt=""><figcaption></figcaption></figure>
5. Go to **Projects** page and click **Create a project**.
6. In step 1, select the file transformer you just created in the **File transformer** dropdown. Finish the project creation and launch the project.

<figure><img src="/files/Yrm1A1sUzpYLZRoTQryV" alt=""><figcaption></figcaption></figure>

Your project is ready!

<figure><img src="/files/qywMuZ3mmtYCLFjn5nxk" alt=""><figcaption></figcaption></figure>

**Notes:**

* You need to add **Metadata** extension to the project.
* If you want the metadata to be available in the text editor, set `pinned: true`.
* Use HTML code color for text color, border color, and background color.

If you have any questions, please reach out to <support@datasaur.ai>.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/workspace-management/file-transformer/import-transformer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
