Custom PII Types
PIIDetector accepts a custom_types dict so you can define domain-specific PII patterns — employee IDs, internal codes, API tokens, or any regex you need.
Setup
| from llmfy import PIIDetector, PIIMaskStyle, PIIStrategy
|
Adding Custom Types
Pass a dict mapping a type name (str) to a regex pattern (str or compiled re.Pattern). The type name becomes the placeholder label when mask_style=TYPE_NAME, and the first two characters are used for PARTIAL.
String Pattern
| detector = PIIDetector(
custom_types={
"EMPLOYEE_ID": r"EMP-\d{6}",
"PROJECT_CODE": r"PRJ-[A-Z]{3}",
}
)
result = detector.detect(
"Employee EMP-001234 is on project PRJ-ABC and emailed john@corp.com"
)
print(result.processed_text)
# "Employee EM* is on project PR* and emailed jo*"
|
Compiled Pattern
| import re
detector = PIIDetector(
custom_types={
"API_TOKEN": re.compile(r"tok_[a-z0-9]+"),
"ORDER_ID": re.compile(r"ORD-\d{8}"),
}
)
result = detector.detect("Token tok_abc123xyz for order ORD-20240315")
print(result.processed_text)
# "Token to* for order OR*"
|
Custom Types with TYPE_NAME Style
| detector = PIIDetector(
mask_style=PIIMaskStyle.TYPE_NAME,
custom_types={
"EMPLOYEE_ID": r"EMP-\d{6}",
"PROJECT_CODE": r"PRJ-[A-Z]{3}",
}
)
result = detector.detect(
"Employee EMP-001234 is on project PRJ-ABC and emailed john@corp.com"
)
print(result.processed_text)
# "Employee [EMPLOYEE_ID] is on project [PROJECT_CODE] and emailed [EMAIL]"
|
Custom Types with REDACT
mask_style is ignored when strategy=REDACT.
| detector = PIIDetector(
strategy=PIIStrategy.REDACT,
types=[],
custom_types={"EMPLOYEE_ID": r"EMP-\d{6}"},
)
result = detector.detect(
"ID: EMP-001234, Email: bob@example.com"
)
print(result.processed_text)
# "ID: [REDACTED], Email: bob@example.com"
|
Note
types=[] disables all built-in PII types. Only the custom patterns run.
Combining Custom and Built-in Types
Custom types run alongside the built-in types list.
| from llmfy import PIIDetector, PIIType
detector = PIIDetector(
types=[PIIType.EMAIL, PIIType.PHONE_NUMBER],
custom_types={"EMPLOYEE_ID": r"EMP-\d{6}"},
)
result = detector.detect(
"Staff EMP-007890 reached at carol@example.com or +628987654321"
)
print(result.processed_text)
# "Staff EM* reached at ca* or +6*"
print([str(d.pii_type) for d in result.detections])
# ['EMPLOYEE_ID', 'EMAIL', 'PHONE_NUMBER']
|
Overriding a Built-in Type
If a custom type name matches a built-in PIIType value (e.g. "EMAIL"), the custom pattern replaces the built-in one entirely. The built-in pattern is suppressed.
| # Only matches custom-*@*.com — standard email addresses are left alone
detector = PIIDetector(
custom_types={"EMAIL": r"custom-\w+@\w+\.com"}
)
result = detector.detect(
"Regular john@example.com and custom-user@corp.com"
)
print(result.processed_text)
# "Regular john@example.com and cu*"
|
Partial Override
Override one built-in type while leaving the rest active.
| # PHONE_NUMBER now only matches +62 (Indonesian) numbers
detector = PIIDetector(
custom_types={"PHONE_NUMBER": r"\+62\d{9,12}"}
)
result = detector.detect(
"Call +628987654321 or (555) 123-4567, email bob@test.com"
)
print(result.processed_text)
# "Call +6* or (555) 123-4567, email bo*"
|
(555) 123-4567 is left untouched because the built-in PHONE_NUMBER pattern is suppressed, and the custom pattern does not match it. EMAIL continues to work normally since it was not overridden.