📖 Audio API documentation

TextualAudio class

class tonic_textual.audio_api.TextualAudio( base_url: str = 'https://textual.tonic.ai', api_key: str | None = None, verify: bool = True, )

Wrapper class to invoke the Tonic Textual API for audio file processing

Parameters:

base_url (str) – The URL to your Tonic Textual instance. Do not include trailing backslashes. The default value is https://textual.tonic.ai.
api_key (str) – Optional. Your API token. Instead of providing the API token here, we recommended that you set the API key in your environment as the value of TONIC_TEXTUAL_API_KEY.
verify (bool) – Whether to verify SSL certification. By default, this is enabled.

Examples

>>> from tonic_textual.audio_api import TextualAudio
>>> textual = TextualAudio()

get_audio_transcript( file_path: str, num_retries: int | None = 30, wait_between_retries: int | None = 10, ) → TranscriptionResult

Redacts the transcription from the provided audio file. Supports m4a, mp3, webm, mpga, wav. Limited to 25MB or less per API call.

Parameters:

file_path (str) – The path to the audio file.
num_retries (Optional[int] = 30) – Defaults to 30. An optional value to specify the number of times to attempt to fetch the result. If a file is not yet ready for download, Textual pauses for 10 seconds before each retrying.
wait_between_retries (int = 10) – The number of seconds to wait between retry attempts. (The default value is 10)

Returns:

TranscriptionResult – The transcription of the audio file

Return type:

dict

redact_audio_file( audio_file_path: str, output_file_path: str, generator_default: PiiState = PiiState.Redaction, generator_config: Dict[str, PiiState] = {}, label_block_lists: Dict[str, List[str]] | None = None, label_allow_lists: Dict[str, List[str]] | None = None, custom_entities: List[str] | None = None, before_beep_buffer: float = 250.0, after_beep_buffer: float = 250.0, )

Generates a redacted audio file by identifying and removing sensitive audio segments. Note that calling this method requires that pydub be installed in addition to the tonic_textual library. Additionally, you’ll need to ensure that your install of ffmpeg has the necessary codec support for your file type.

Parameters:

audio_file_path (str) – The path to the input audio file. Supported file types are wav, mp3, ogg, flv, wma, aac, and others. See https://github.com/jiaaro/pydub for complete information on file types supported.
output_file_path (str) – The path to save the redacted output file. The output file path specifies the audio file type that the output is written as via it’s extension. Supported file types are wav, mp3, ogg, flv, wma, and aac. See https://github.com/jiaaro/pydub for complete information on file types supported.
generator_default (PiiState = PiiState.Redaction) – The default redaction used for types that are not specified in generator_config. Value must be one of “Redaction”, “Synthesis”, or “Off”.
generator_config (Dict[str, PiiState]) – A dictionary of sensitive data entities. For each entity, indicates whether to redact, synthesize, or ignore it. Values must be one of “Redaction”, “Synthesis”, or “Off”.
label_block_lists (Optional[Dict[str, List[str]]]) – A dictionary of (entity type, ignored values). When a value for an entity type matches a listed regular expression, the value is ignored and is not redacted or synthesized.
label_allow_lists (Optional[Dict[str, List[str]]]) – A dictionary of (entity type, additional values). When a piece of text matches a listed regular expression, the text is marked as the entity type and is included in the redaction or synthesis.
custom_entities (Optional[List[str]]) – A list of custom entity type identifiers to include. Each custom entity type included here may also be included in the generator config. Custom entity types will respect generator defaults if they are not specified in the generator config.
before_beep_buffer (float, optional) – Buffer time (in milliseconds) to include before redaction interval (default is 250.0).
after_beep_buffer (float, optional) – Buffer time (in milliseconds) to include after redaction interval (default is 250.0).

Returns:

The path to the redacted output audio file.

Return type:

str

redact_audio_transcript( transcription: TranscriptionResult, generator_default: PiiState = PiiState.Redaction, generator_config: Dict[str, PiiState] = {}, generator_metadata: Dict[str, BaseMetadata] = {}, random_seed: int | None = None, label_block_lists: Dict[str, List[str]] | None = None, custom_entities: List[str] | None = None, ) → RedactedTranscriptionResult

Redacts the transcription from the provided audio file. Supports m4a, mp3, webm, mpga, wav. Limited to 25MB or less per API call.

Parameters:

transcription (TranscriptionResult) – A transcription result, typically obtained by calling get_audio_transcription first.
generator_default (PiiState = PiiState.Redaction) – The default redaction used for types that are not specified in generator_config. Value must be one of “Redaction”, “Synthesis”, or “Off”.
generator_config (Dict[str, PiiState]) – A dictionary of sensitive data entities. For each entity, indicates whether to redact, synthesize, or ignore it. Values must be one of “Redaction”, “Synthesis”, or “Off”.
generator_metadata (Dict[str, BaseMetadata]) – A dictionary of sensitive data entities. For each entity, indicates generator configuration in case synthesis is selected. Values must be of types appropriate to the PII type.
random_seed (Optional[int] = None) – An optional value to use to override Textual’s default random number seeding. Can be used to ensure that different API calls use the same or different random seeds.
label_block_lists (Optional[Dict[str, List[str]]]) – A dictionary of (entity type, ignored values). When a value for an entity type matches a listed regular expression, the value is ignored and is not redacted or synthesized.
custom_entities (Optional[List[str]]) – A list of custom entity type identifiers to include. Each custom entity type included here may also be included in the generator config. Custom entity types will respect generator defaults if they are not specified in the generator config.

Returns:

The redacted transcription

Return type:

RedactedTranscriptionResult

Examples

>>> textual.redact_audio(
>>>     <path to file>,
>>>     # only redacts NAME_GIVEN
>>>     generator_default="Off",
>>>     generator_config={"NAME_GIVEN": "Redaction"},
>>>     random_seed = 123,
>>>     # Occurrences of "There" are treated as NAME_GIVEN entities
>>>     label_allow_lists={"NAME_GIVEN": ["There"]},
>>>     # Text matching the regex ` ([a-z]{2}) ` is not treated as an occurrence of NAME_FAMILY
>>>     label_block_lists={"NAME_FAMILY": [" ([a-z]{2}) "]},
>>>     # The custom entities passed here will be included in the redaction and may be included in generator_config
>>>     custom_entities=["CUSTOM_COGNITIVE_ACCESS_KEY", "CUSTOM_PERSONAL_GRAVITY_INDEX"],
>>> )

class tonic_textual.classes.audio.redacted_transcription_result.RedactedTranscriptionResult( original_transcript: TranscriptionResult, redacted_text: str, redacted_segments: List[List[Replacement]], usage: int, )

Redaction response object

Variables:

original_transcript (TranscriptionResult) – The original transcription result
redacted_text (str) – The redacted and synthesized text of the original transcript. Speaking segments are separated by new lines
redacted_segments (List[RedactedSegment]) – A list of segments from the original transcript which include the segment text and list of named entities
usage (int) – The number of words used

class tonic_textual.classes.audio.redact_audio_responses.TranscriptionResult( text: str, segments: TranscriptionSegment, language: str = '', )

Represents the result of a full transcription, including text, segments, and language.

Variables:

text (str) – The full transcription text.
segments (List[TranscriptionSegment]) – The list of transcription segments.
language (str, optional) – The detected language of the transcription (default is empty string).

class tonic_textual.classes.audio.redact_audio_responses.TranscriptionWord( start: float, end: float, word: str, )

Represents a single word in a transcription, including start and end timestamps.

Variables:

start (float) – The start time of the word in seconds.
end (float) – The end time of the word in seconds.
word (str) – The spoken word.

class tonic_textual.classes.audio.redact_audio_responses.TranscriptionSegment( start: float, end: float, id: int, text: str, words: List[TranscriptionWord], )

Represents a segment of the transcription containing text and words with timestamps.

Variables:

start (float) – The start time of the segment in seconds.
end (float) – The end time of the segment in seconds.
id (int) – The segment identifier.
text (str) – The full text of the segment.
words (List[TranscriptionWord]) – A list of words included in the segment.