Redacting a transcript
To redact a transcript you’ll first need to generate a transcription result, which you can do via the get_audio_transcript
method (see here for an example).
Once you have a transcript you can call redact_audio_transcript
. Here is an example:
from tonic_textual.audio_api import TextualAudio
from tonic_textual.enums.pii_type import PiiType
textual = TextualAudio()
# Provide a list of entities to 'beep' out. If you don't provide a generator_config all entities will be 'beep'-ed out unless generator_default is set to 'Off'
sensitive_entities=['NAME_GIVEN','NAME_FAMILY']
gc = {k: 'Off' for k in PiiType if k not in sensitive_entities}
transcript = textual.get_audio_transcript('<path to audio file>')
redacted_transcript = textual.redact_audio_transcript(transcript, generator_config=gc, generator_default='Off').
The redact_audio_transcript()
will return a redacted_transcript_result
which will include the original transcription, the redacted/synthesized text of the transcription, a list of redacted_segments, and the usage.
Additional Remarks
When using the Textual Cloud (https://textual.tonic.ai) file uploads are limited to 25MB or less. Supported file types are m4a, mp3, webm, mpga, wav. For file types like m4a you’ll need to make sure your build of ffmpeg has the necessary libraries.