Redact XML dataΒΆ

To redact sensitive information from XML, pass the XML document string to the redact_xml method.

Similar to other SDK functions that modify data, redact_xml allows you to configure how to treat different entity types.

To learn more about the common parameters:

  • generator_default

  • generator_config

  • label_allow_lists

  • label_block_lists

go to Choosing tokenization or synthesis.

from tonic_textual.redact_api import TextualNer
import json

textual = TextualNer()

xml_string = '''<?xml version="1.0" encoding="UTF-8"?>
<!-- This XML document contains sample PII with namespaces and attributes -->
<PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact">
    <!-- Personal Information with an attribute containing PII -->
    <Name preferred="true" contact:userID="john.doe123">
        <FirstName>John</FirstName>
        <LastName>Doe</LastName>He was born in 1980.</Name>

    <contact:Details>
        <!-- Email stored in an attribute for demonstration -->
        <contact:Email address="john.doe@example.com"/>
        <contact:Phone type="mobile" number="555-6789"/>
    </contact:Details>

    <!-- SSN stored as an attribute -->
    <SSN value="987-65-4321" xsi:nil="false"/>
    <data>his name was John Doe</data>
</PersonInfo>'''

xml_redaction = textual.redact_xml(xml_string)

The response includes entity level information, including the XPATH where the sensitive entity is found.

The start and end positions are relative to the beginning of thhe XPATH location where the entity is found.