Viewing detected entities for a dataset¶
You can retrieve a list of entities that were detected in the dataset files.
Retrieving all entities for a dataset¶
To retrieve the complete list of entities for a dataset:
ds = ner.get_dataset('<dataset name>')
#retrieve all processed files in the dataset
files = ds.get_processed_files(refetch=True)
for file in files:
entities = file.get_entities()
It returns a response in the form of a dictionary where:
The key is the entity type.
The value is the list of detected entities of that type.
For each entity, the response includes:
The original text value of the entity.
To provide context, a few words that precede and follow the entity.
For example:
{
"NAME_GIVEN": [
{
"head": "Last Name: First Name: Smith ",
"tail": " Date of Birth: Phone Number: 07/16/1988 (929) 555",
"entity": "John"
},
{
"head": "The value for key 'Beneficiary First Name:' is: ",
"tail": "",
"entity": "John"
}
],
"NAME_FAMILY": [
{
"head": "Last Name: First Name: ",
"tail": " John Date of Birth: Phone Number: 07/16/1988 (929",
"entity": "Smith"
},
{
"head": "le cell with no column or row header: First Name: ",
"tail": "",
"entity": "Richardson"
},
{
"head": "ber Information Last Name: First Name: Richardson ",
"tail": " Prescriber NPI Number: Prescriber Specialty: 2982",
"entity": "Scott"
},
{
"head": "Beneficiary First Name: ",
"tail": "",
"entity": "Smith"
}
]
}
Retrieving specific types of entities for a dataset¶
The call to get_entities() can take an optional list of entity types.
For example, you could pass in a hard-coded list of entity types:
file.get_entities(['NAME_GIVEN','NAME_FAMILY'])
Or you could use the PiiType enum:
from tonic_textual.enums.pii_type import PiiType
file.get_entities([PiiType.NAME_GIVEN, PiiType.NAME_FAMILY])
Retrieving the entities for the enabled entity types for a dataset¶
To pass in the current set of entities that are enabled by the dataset configuration:
from tonic_textual.enums.pii_state import PiiState
#Get list of all enabled entities for the dataset
entities = [k for k in ds.generator_config.keys() if ds.generator_config[k]!=PiiState.Off]
entities = file.get_entities(entities)
file.get_entities(entities)
Viewing entity mappings for a dataset¶
You can retrieve mappings for each detected entity in a dataset.
ds = ner.get_dataset('<dataset name>')
mappings = ds.get_entity_mappings()
for file in mappings.files:
for entity in file.entities:
print(file.file_name, entity.text, entity.output_text)
The response is grouped by file.
Each entity mapping includes:
The original entity value.
The redacted version of the entity value.
The synthesized version of the entity value.
The final output value based on the current dataset configuration.