# Custom OCR

## OCR Custom Text Extraction API

This custom text extraction API is a Datasaur feature which allows creating a custom [OCR](/data-studio-projects/nlp-task-types/project-templates.md#optical-character-recognition) project using your own text extraction API.

### Request from Datasaur

> **POST** <https://custom-text-extractor.com/text-extraction/example>

| **Request headers** |                              |
| ------------------- | ---------------------------- |
| Accept              | application/json, text/plain |

| **Form Data Parameters** |                                                                                                                                                |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| upload                   | Your document file (e.g.: [receipt.jpg](https://user-images.githubusercontent.com/1897341/108043465-b844d200-7073-11eb-9beb-a69305024e15.jpg)) |

### **Expected API Response**

Datasaur can process the response differently based on the **`Content-Type`** header returned from the API response.

#### Text response (`Content-Type: text/plain)`

```
SHIHLIN TAIWAN
STREET SNACKS
Grand Galaxy Park
DATE 26/02/20 15:53
CASHIER: Reny
No. Customer: 1
```

#### JSON response (`Content-Type: application/json`)

Datasaur uses [Importable format](/api/custom-ocr/importable-format.md) to process the API response.

```javascript
{
  "cells": [
    {
      "content": "SHIHLIN TAIWAN",
      "index": 0,
      "line": 0,
      "metadata": [],
      "tokens": [
        "SHIHLIN",
        "TAIWAN"
      ]
    },
    {
      "content": "STREET SNACKS",
      "index": 0,
      "line": 1,
      "metadata": [],
      "tokens": [
        "STREET",
        "SNACKS"
      ]
    }
  ],
  "labelSets": [],
  "labels": [
    {
      "startCellLine": 0,
      "startCellIndex": 0,
      "startTokenIndex": 0,
      "startCharIndex": 0,
      "endCellLine": 0,
      "endCellIndex": 0,
      "endTokenIndex": 0,
      "endCharIndex": 6,
      "layer": 0,
      "counter": 0,
      "pageIndex": 0,
      "type": "BOUNDING_BOX",
      "nodeCount": 4,
      "x0": 130,
      "y0": 154,
      "x1": 255,
      "y1": 154,
      "x2": 255,
      "y2": 186,
      "x3": 130,
      "y3": 186
    },
    {
      "startCellLine": 0,
      "startCellIndex": 0,
      "startTokenIndex": 1,
      "startCharIndex": 0,
      "endCellLine": 0,
      "endCellIndex": 0,
      "endTokenIndex": 1,
      "endCharIndex": 5,
      "layer": 0,
      "counter": 0,
      "pageIndex": 0,
      "type": "BOUNDING_BOX",
      "nodeCount": 4,
      "x0": 261,
      "y0": 154,
      "x1": 375,
      "y1": 154,
      "x2": 375,
      "y2": 186,
      "x3": 261,
      "y3": 186
    }
  ],
  "name": "receipt.jpg",
  "pages": [
    {
      "pageIndex": 0,
      "pageHeight": 619,
      "pageWidth": 551
    }
  ],
  "type": "BOUNDING_BOX"
}
```

### Apply custom API

* Upload the PDF or images through [Project Creation Wizard](/data-studio-projects/creating-a-project.md#project-creation-wizard)
* In Step 2,
  * Select **+Add new API...** as the OCR method

    <figure><img src="/files/kel6YylVm8xJZ962l5Pf" alt=""><figcaption></figcaption></figure>
  * Put your API name, API URL, and the secret

    <figure><img src="/files/UV5rVPQuPFq3oP204uZV" alt=""><figcaption></figcaption></figure>
  * After clicking Save, the custom API will be saved to the list, and you can choose it as the OCR method
* [Create a label set, assign labelers as appropriate, then launch the project](/workspace-management/workspace.md#creating-a-project)
* The interface will appear side-by-side with the PDF on the left and the transcription on the right

  <figure><img src="/files/ykpHXqp7jYrOccVFEPUs" alt=""><figcaption></figcaption></figure>

## ASR Custom Text Extraction API

This custom text extraction API is a Datasaur feature which allows creating a custom[ Audio](https://datasaurai.gitbook.io/datasaur/nlp-projects/nlp-task-types/audio-project) project using your own text extraction API.

### Request from Datasaur

> **POST** <https://custom-text-extractor.com/text-extraction/example>

| **Request headers** |                              |
| ------------------- | ---------------------------- |
| Accept              | application/json, text/plain |

| **Form Data Parameters** |                                        |
| ------------------------ | -------------------------------------- |
| upload                   | Your document file (e.g.: audio2.flac) |

### **Expected API Response**

Datasaur can process the response differently based on the **`Content-Type`** header returned from the API response.

#### Text response (`Content-Type: text/plain)`

```
A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
```

#### JSON response (`Content-Type: application/json`)

Datasaur uses [Importable format](/api/custom-ocr/importable-format.md) to process the API response.

```javascript
{
  "cells": [
    {
      "content": "SHIHLIN TAIWAN",
      "index": 0,
      "line": 0,
      "metadata": [],
      "tokens": ["SHIHLIN", "TAIWAN"]
    },
    {
      "content": "STREET SNACKS",
      "index": 0,
      "line": 1,
      "metadata": [],
      "tokens": ["STREET", "SNACKS"]
    }
  ],
  "labelSets": [],
  "labels": [
    {
      "id": 1,
      "startCellLine": 0,
      "startCellIndex": 0,
      "startTokenIndex": 0,
      "startCharIndex": 0,
      "endCellLine": 0,
      "endCellIndex": 0,
      "endTokenIndex": 1,
      "endCharIndex": 5,
      "layer": 0,
      "counter": 0,
      "startTimestampMillis": 1375,
      "endTimestampMillis": 4250,
      "type": "TIMESTAMP"
    },
    {
      "id": 2,
      "startCellLine": 1,
      "startCellIndex": 0,
      "startTokenIndex": 0,
      "startCharIndex": 0,
      "endCellLine": 1,
      "endCellIndex": 0,
      "endTokenIndex": 1,
      "endCharIndex": 5,
      "layer": 0,
      "counter": 0,
      "startTimestampMillis": 4437,
      "endTimestampMillis": 8218,
      "type": "TIMESTAMP"
    }
  ],
  "name": "ASR API Response Sample",
  "type": "TIMESTAMP"
}
```

### Apply custom API

* Upload the audio files through [Project Creation Wizard](/data-studio-projects/creating-a-project.md#project-creation-wizard)
* In Step 2,
  * Select **+Add new API...** as the ASR method

    <figure><img src="/files/4zugpQLIkps6kNyBeb8C" alt=""><figcaption></figcaption></figure>
  * Put your API name, API URL, and the secret

    <figure><img src="/files/0DICv6xvSRQjmSzyQvif" alt=""><figcaption></figcaption></figure>
  * After clicking Save, the custom API will be saved to the list, and you can choose it as the ASR method
* [Create a label set, assign labelers as appropriate, then launch the project](/workspace-management/workspace.md#creating-a-project)
* The interface will appear like the screenshot below, with the audio on the top and the transcription on the bottom

  <figure><img src="/files/E30ex9h0C9o8GgQ6ea5r" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
Custom API capabilities are **only supported in team workspaces**. If you would like access, please email us at <support@datasaur.ai>.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasaur.ai/api/custom-ocr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
