Audio Project
Create and label an audio labeling project in Datasaur
Project creation

Creating an audio project is simple in Datasaur. All the steps are the same as creating other labeling task types.
From the Projects page, click Create project.
Upload both your audio file and its transcription.
Accepted audio formats:
.mp3,.flac, and.wavAccepted transcription formats:
.srtand.txtYou can download example audio and transcription files below.
Make sure the transcription file name matches the audio file name. For example:
SampleFile.mp3andSampleFile.srt. When both files have the same name, the system recognizes them as corresponding files.

Continue with the remaining project setup steps: Preview, Labeler's tasks, Assignment, and Project Settings.

Audio interface legend

Label an audio project

At the top of the interface, you will find an audio player with timestamps. Below it, you will see the transcript, where you can label spans using the label sets you have added.
Between the audio player and the transcript, there is a control panel where you can perform the actions listed below, or watch this brief video for a visual guide.
Rewind 10 seconds
Play/Pause
Fast forward 10 seconds
Adjust volume
Use the timestamp field to jump to a specific time
Enter Create timestamp mode
Zoom in or out of the audio timestamps
Open audio settings (audio speed and auto-scroll)

The Create timestamp button allows you to create a new timestamp and link it to the corresponding text. Select a portion of the audio timeline, then highlight the matching span of tokens in the transcript. The timestamp is now linked to that span of tokens.


Edit sentences in audio project
When editing transcriptions, the system adjusts timestamp behavior based on how much the text changes.
If the similarity between the original transcription and the edited version is above 70%, the timestamp labels are kept, and you only need to adjust the corresponding text.
If the similarity falls below 70%, the timestamp labels may be removed. This happens when the edited text differs significantly from the original. The threshold ensures that timestamp labels remain for minor edits but are removed for more major changes.
For example, after editing the sentence in the first line, the timestamp labels will disappear because the similarity between the original and edited sentence is below 70%.
If you upload an empty transcription, placeholder lines with - will be created and automatically associated with timestamps. Editing the placeholder content will remove the timestamp label, and you will need to create it again manually.
Last updated