Custom ASR

Custom ASR enables users to integrate and test their own Automatic Speech Recognition (ASR) APIs with Datasaur projects. To enable this feature in your workspace, contact us at [email protected].

Configuring Custom ASR

Step 1: Set Up the Sample API

  1. Open the provided CodeSandbox example API here.

    • To test this code, you can also create a live endpoint using codesandbox.io.

  2. Save any changes to the codebase to automatically create a fork for further testing and modifications.

  3. Copy the base URL of the API from the CodeSandbox preview panel (on the right-hand side). In addition, you may need to ensure the panel is expanded if it’s collapsed.

  4. In the sample API that we provided, you can use the following endpoints based on your needs:

    • {baseUrl}/text-extraction/example-text (returns plain text transcription).

    • {baseUrl}/text-extraction/example-json (returns Importable JSON transcription).

Step 2: Add Custom API in Project Creation Wizard

  1. Go to the Active Projects page and click “Create Project".

  2. In Step 1: Upload, upload your image/PDF (for OCR) or audio file (for ASR) and click Next.

  3. In Step 2: Preview, click the dropdown labeled “Apply ASR Method” (or “Apply OCR Method” for images/PDFs).

  4. Select “+ Add New API…” to open a dialog box.

    • Fill in the fields:

      • Name: Desired name for the API.

      • Custom API URL: Paste one of the two endpoints from our sample API.

      • Secret: Add any placeholder value (this can be ignored for now).

  5. Click Save to add and select the API.

  6. Review the sample transcription on the right panel.

Step 3: Create the Project

Once the API is added, complete the remaining project configuration steps to create the project and use the transcription results.

ASR Sample File

Whisper ASR Limitations

Factor
Description

Accepted file formats

.wav, .mp3, .m4a

Maximum individual file size

25 MB

Request limit (Azure)

100 RPM

Request timeout

10 minutes

Maximum API call retries

2 retries on request errors (e.g., rate limit reached, timeout).

Amazon Transcribe ASR Limitations

Factor
Description

Accepted file formats

.flac, .m4a, .mp3, .wav

Maximum individual file size

130 MB

Maximum audio duration

14.400 seconds

Last updated