Redacting a transcriptΒΆ
Before you can redact a transcript, you must first generate a transcription result. To do this, use the get_audio_transcript method. For an example, go to see here for an example.
Once you have a transcript, call redact_audio_transcript.
For example:
from tonic_textual.audio_api import TextualAudio
from tonic_textual.enums.pii_type import PiiType
textual = TextualAudio()
sensitive_entities=['NAME_GIVEN','NAME_FAMILY']
gc = {k: 'Redaction' for k in sensitive_entities}
transcript = textual.get_audio_transcript('<path to audio file>')
redacted_transcript = textual.redact_audio_transcript(transcript, generator_config=gc, generator_default='Off').
The redact_audio_transcript() returns a redacted_transcript_result, which includes:
The original transcription.
The redacted or synthesized text of the transcription
A list of redacted_segments.
The usage.
Additional remarks
When you use Textual Cloud (https://textual.tonic.ai), file uploads are limited to 25MB or smaller.
Textual supports the following audio file types: m4a, mp3, webm, mpga, wav
For file types such as m4a, make that sure your build of ffmpeg has the necessary libraries.