Custom OCR
OCR Custom Text Extraction API
This custom text extraction API is a Datasaur feature which allows creating a custom OCR project using your own text extraction API.
Request from Datasaur
POST https://custom-text-extractor.com/text-extraction/example
Request headers
Accept
application/json, text/plain
Form Data Parameters
upload
Your document file (e.g.: receipt.jpg)
Expected API Response
Datasaur can process the response differently based on the Content-Type
header returned from the API response.
Text response (Content-Type: text/plain)
Content-Type: text/plain)
SHIHLIN TAIWAN
STREET SNACKS
Grand Galaxy Park
DATE 26/02/20 15:53
CASHIER: Reny
No. Customer: 1
JSON response (Content-Type: application/json
)
Content-Type: application/json
)Datasaur uses Importable format to process the API response.
{
"cells": [
{
"content": "SHIHLIN TAIWAN",
"index": 0,
"line": 0,
"metadata": [],
"tokens": [
"SHIHLIN",
"TAIWAN"
]
},
{
"content": "STREET SNACKS",
"index": 0,
"line": 1,
"metadata": [],
"tokens": [
"STREET",
"SNACKS"
]
}
],
"labelSets": [],
"labels": [
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 0,
"endCharIndex": 6,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 130,
"y0": 154,
"x1": 255,
"y1": 154,
"x2": 255,
"y2": 186,
"x3": 130,
"y3": 186
},
{
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 1,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"pageIndex": 0,
"type": "BOUNDING_BOX",
"nodeCount": 4,
"x0": 261,
"y0": 154,
"x1": 375,
"y1": 154,
"x2": 375,
"y2": 186,
"x3": 261,
"y3": 186
}
],
"name": "receipt.jpg",
"pages": [
{
"pageIndex": 0,
"pageHeight": 619,
"pageWidth": 551
}
],
"type": "BOUNDING_BOX"
}
Apply custom API
Upload the PDF or images through Project Creation Wizard
In Step 2,
Select +Add new API... as the OCR method
Put your API name, API URL, and the secret
After clicking Save, the custom API will be saved to the list, and you can choose it as the OCR method
The interface will appear side-by-side with the PDF on the left and the transcription on the right
ASR Custom Text Extraction API
This custom text extraction API is a Datasaur feature which allows creating a custom Audio project using your own text extraction API.
Request from Datasaur
POST https://custom-text-extractor.com/text-extraction/example
Request headers
Accept
application/json, text/plain
Form Data Parameters
upload
Your document file (e.g.: audio2.flac)
Expected API Response
Datasaur can process the response differently based on the Content-Type
header returned from the API response.
Text response (Content-Type: text/plain)
Content-Type: text/plain)
A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
Speaker 2: A quick brown fox jumps over a lazy dog
Speaker 1: A quick brown fox jumps over a lazy dog
JSON response (Content-Type: application/json
)
Content-Type: application/json
)Datasaur uses Importable format to process the API response.
{
"cells": [
{
"content": "SHIHLIN TAIWAN",
"index": 0,
"line": 0,
"metadata": [],
"tokens": ["SHIHLIN", "TAIWAN"]
},
{
"content": "STREET SNACKS",
"index": 0,
"line": 1,
"metadata": [],
"tokens": ["STREET", "SNACKS"]
}
],
"labelSets": [],
"labels": [
{
"id": 1,
"startCellLine": 0,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 0,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"startTimestampMillis": 1375,
"endTimestampMillis": 4250,
"type": "TIMESTAMP"
},
{
"id": 2,
"startCellLine": 1,
"startCellIndex": 0,
"startTokenIndex": 0,
"startCharIndex": 0,
"endCellLine": 1,
"endCellIndex": 0,
"endTokenIndex": 1,
"endCharIndex": 5,
"layer": 0,
"counter": 0,
"startTimestampMillis": 4437,
"endTimestampMillis": 8218,
"type": "TIMESTAMP"
}
],
"name": "ASR API Response Sample",
"type": "TIMESTAMP"
}
Apply custom API
Upload the audio files through Project Creation Wizard
In Step 2,
Select +Add new API... as the ASR method
Put your API name, API URL, and the secret
After clicking Save, the custom API will be saved to the list, and you can choose it as the ASR method
The interface will appear like the screenshot below, with the audio on the top and the transcription on the bottom
Last updated