Redacting a transcript

To redact a transcript you’ll first need to generate a transcription result, which you can do via the get_audio_transcript method (see here for an example).

Once you have a transcript you can call redact_audio_transcript. Here is an example:

from tonic_textual.audio_api import TextualAudio
from tonic_textual.enums.pii_type import PiiType

textual = TextualAudio()

# Provide a list of entities to 'beep' out.  If you don't provide a generator_config all entities will be 'beep'-ed out unless generator_default is set to 'Off'
sensitive_entities=['NAME_GIVEN','NAME_FAMILY']
gc = {k: 'Off' for k in PiiType if k not in sensitive_entities}

transcript = textual.get_audio_transcript('<path to audio file>')

redacted_transcript = textual.redact_audio_transcript(transcript, generator_config=gc, generator_default='Off').

The redact_audio_transcript() will return a redacted_transcript_result which will include the original transcription, the redacted/synthesized text of the transcription, a list of redacted_segments, and the usage.

Additional Remarks

When using the Textual Cloud (https://textual.tonic.ai) file uploads are limited to 25MB or less. Supported file types are m4a, mp3, webm, mpga, wav. For file types like m4a you’ll need to make sure your build of ffmpeg has the necessary libraries.