Generate transcriptΒΆ
Textual can generate a transcript from an audio file. To do this, use the get_audio_transcript method.
To generate a transcript:
from tonic_textual.audio_api import TextualAudio
textual = TextualAudio()
transcription = textual.get_audio_transcript('path_to_file.mp3')
This generates a transcription_result.
It contains:
The full text of the transcription.
The detected language.
A list of audio segments. Each segment is some portion of the transcription with start and end times in milliseconds.
It looks something like this:
{
"text": "Thank you for calling First National Bank. My name is Steve. How may I assist you today? Hello, Steve. I have a problem with my credit card statement.",
"segments": [
{
"start": 0,
"end": 4300.0001,
"id": 0,
"text": " Thank you for calling First National Bank. My name is Steve. How may I assist you today?",
"words": [
{
"start": 0,
"end": 839.9999,
"word": "Thank"
},
{
"start": 839.9999,
"end": 899.9999,
"word": "you"
},
{
"start": 899.9999,
"end": 1120,
"word": "for"
},
{
"start": 1120,
"end": 1259.9999,
"word": "calling"
},
{
"start": 1259.9999,
"end": 1580,
"word": "First"
},
{
"start": 1580,
"end": 1879.9999,
"word": "National"
},
{
"start": 1879.9999,
"end": 2220,
"word": "Bank"
},
{
"start": 2440,
"end": 2559.9999,
"word": "My"
},
{
"start": 2559.9999,
"end": 2720,
"word": "name"
},
{
"start": 2720,
"end": 3259.9999,
"word": "is"
},
{
"start": 3259.9999,
"end": 3259.9999,
"word": "Steve"
},
{
"start": 3339.9999,
"end": 3460,
"word": "How"
},
{
"start": 3460,
"end": 3559.9999,
"word": "may"
},
{
"start": 3559.9999,
"end": 3859.9998,
"word": "I"
},
{
"start": 3859.9998,
"end": 3859.9998,
"word": "assist"
},
{
"start": 3859.9998,
"end": 4000,
"word": "you"
},
{
"start": 4000,
"end": 4300.0001,
"word": "today"
}
]
},
{
"start": 5280.0002,
"end": 7780.0002,
"id": 1,
"text": " Hello, Steve. I have a problem with my credit card statement.",
"words": [
{
"start": 5280.0002,
"end": 5659.9998,
"word": "Hello"
},
{
"start": 5659.9998,
"end": 5900,
"word": "Steve"
},
{
"start": 5960,
"end": 6179.9998,
"word": "I"
},
{
"start": 6179.9998,
"end": 6300.0001,
"word": "have"
},
{
"start": 6300.0001,
"end": 6619.9998,
"word": "a"
},
{
"start": 6619.9998,
"end": 6619.9998,
"word": "problem"
},
{
"start": 6619.9998,
"end": 6820.0001,
"word": "with"
},
{
"start": 6820.0001,
"end": 7119.9998,
"word": "my"
},
{
"start": 7119.9998,
"end": 7199.9998,
"word": "credit"
},
{
"start": 7199.9998,
"end": 7480,
"word": "card"
},
{
"start": 7480,
"end": 7780.0002,
"word": "statement"
}
]
}
],
"language": "english"
}
Additional remarks
When you use the Textual Cloud (https://textual.tonic.ai), file uploads are limited to 25MB or less.
Textual supports the following file types: m4a, mp3, webm, mpga, wav.
For file types such as m4a, make sure that your build of ffmpeg has the necessary libraries.