Policies¶
A policy defines which entity types get anonymized for a given class of document. You can create multiple policies per workspace, one for legal documents, one for healthcare, one for financial data, and the right policy is applied automatically based on the API key's scope or the request's document class.
Document Classes¶
Every policy targets a document class:
| Class | Use Case |
|---|---|
general |
Default. Mixed or unclassified content. |
legal |
Contracts, NDAs, court filings, legal correspondence. |
healthcare |
Patient records, clinical notes, insurance documents. Enables k-anonymity guard. |
finance |
Financial statements, trading records, investment documents. |
When your API key is scoped to a doc class (e.g. finance), only requests tagged with that class are allowed through. Requests with a different doc class are blocked with a 403.
Policy Toggles¶
Each policy has the following toggles:
Strip Person Names¶
Detects and replaces individual names using two methods:
- Title-prefix detection (high confidence):
Mr. John Smith,Dr. Sarah Williams,CEO Michael Chen, the title anchors the detection - Known first-name detection (medium confidence):
Michael Chen,Sarah Williams, matched against a curated list of common first names
Replacement: consistent workspace-scoped pseudonym (e.g. Finley Warren).
Strip Organisation Names¶
Detects company names with legal suffixes:
Goldman Sachs Group Inc., Apex Holdings LLC, St. Mary's Hospital
Replacement: consistent pseudonym (e.g. Cobalt Group).
Strip Emails & Phones¶
- Emails:
michael@goldman.com→morgan@corp-anon.org - Phones:
+1 (212) 555-0100→(555) 141-0033
SSNs and credit card numbers are always stripped regardless of this toggle.
Shift Dates¶
All detected dates are offset by a fixed number of days (configurable, default 30):
March 12, 2024 → April 11, 2024
The same offset applies consistently across the document, preserving relative time relationships.
Supports formats: Month DD, YYYY, DD Month YYYY, YYYY-MM-DD, MM/DD/YYYY.
Scale Financial Figures¶
Monetary amounts are multiplied by a configurable scale factor:
$2.4 million → $1.2 million (with scale factor 0.5)
Supported currencies: USD, GBP, EUR, and plain numeric amounts with million, billion, thousand suffixes.
Anonymize Signatures¶
Detects and redacts signature indicators in legal and formal documents:
| Pattern | Input | Output |
|---|---|---|
| Electronic signature | /s/ Michael Chen |
/s/ Finley Warren |
| Sign-off block | Signed by: Sarah Williams, General Counsel |
Signed by: Riley Carter, General Counsel |
| Blank signature line | Signature: ________________________ |
[SIGNATURE LINE REDACTED] |
| Digital certificate | CN=Michael Chen, OU=Finance |
CN=Finley Warren, OU=Finance |
| Notary block | Notarized by: Robert Johnson, Notary Public |
Notarized by: Reese Chen, Notary Public |
| Placeholder | [SIGNATURE BLOCK], [SIG] |
[SIGNATURE REDACTED] |
Enable for legal documents
Signature anonymization is most important for contracts, court filings, and notarised documents where the signatory identity is itself PII. Enable it on your legal doc class policy.
Permanent Redaction Mode¶
When enabled, the de-anonymization step is skipped entirely. Pseudonyms remain in the LLM response permanently, real values are never restored before the response reaches your application.
Use this when:
- Storing LLM summaries in a database where real PII must never be written
- Generating training data or synthetic documents
- Producing outputs that will be reviewed by third parties who must not see real PII
{
"name": "Training Data Policy",
"permanent_redact": true,
"strip_persons": true,
"strip_orgs": true,
"strip_emails": true
}
With permanent redaction, choices[0].message.content will contain pseudonyms (Finley Warren, Cobalt Group) rather than the original names. The entity map is still stored in the vault, so you can de-anonymize manually using the API if needed.
Set as Default Policy¶
When checked, this policy is applied to all requests that don't match a more specific policy. Only one policy per workspace can be the default.
Custom Entity Types¶
Define your own regex-based entity patterns that are applied in addition to the built-in types. Useful for domain-specific identifiers that the built-in detector does not cover:
- Employee IDs:
EMP-\d{6} - Internal project codenames:
PROJ-(ALPHA|BETA|GAMMA)-\d{3} - Custom account numbers:
ACC-[A-Z]{2}\d{8} - Medical record numbers:
MRN-\d{7}
Each custom entity has three fields:
| Field | Description |
|---|---|
name |
Label shown in vault stats (e.g. EMPLOYEE_ID) |
pattern |
Python-compatible regex applied to the message text |
replacement |
Literal string to substitute on match. Default: [CUSTOM-REDACTED] |
Custom entities are applied after all built-in entity detection, in the order they are defined.
Via API¶
curl -X POST https://www.xybern.com/api/redact/{workspace_id}/policies \
-H "Content-Type: application/json" \
-d '{
"name": "HR Document Policy",
"doc_class": "general",
"strip_persons": true,
"custom_entities": [
{
"name": "employee_id",
"pattern": "EMP-\\d{6}",
"replacement": "[EMP-ID-REDACTED]"
},
{
"name": "project_code",
"pattern": "PROJ-(ALPHA|BETA|GAMMA)-\\d{3}",
"replacement": "[PROJECT-CODE]"
}
]
}'
Via Dashboard¶
In the Policies tab, open or create a policy and scroll to Custom Entity Types. Add rows with the entity name, regex pattern, and replacement string. Patterns are validated server-side on save.
Regex validation
Invalid regex patterns are silently skipped at runtime. Test your pattern against sample input before deploying to production.
Creating a Policy¶
Via the Redact dashboard:
- Go to Policies tab
- Click New Policy
- Set a name, document class, and toggle the entity types you want anonymized
- Optionally set the date offset (days) and financial scale factor
- Check Set as default if this should be the fallback for all requests
- Click Save Policy
Via API (using your session cookie or admin access):
curl -X POST https://www.xybern.com/api/redact/{workspace_id}/policies \
-H "Content-Type: application/json" \
-d '{
"name": "Legal Privilege Policy",
"doc_class": "legal",
"strip_persons": true,
"strip_orgs": true,
"strip_emails": true,
"strip_phones": true,
"strip_dates": true,
"strip_signatures": true,
"strip_financials": false,
"date_offset_days": 45,
"permanent_redact": false,
"custom_entities": [],
"is_default": true
}'
Policy Resolution Order¶
When a request arrives:
- If the API key has
allowed_doc_classesset, only policies matching those classes are considered - The policy with
is_default: trueis used if no more specific match exists - If no policy exists at all, a built-in default is applied (persons, orgs, emails, phones stripped, no dates, financials, or signatures)
Healthcare K-Anonymity Guard¶
For policies with doc_class: healthcare, Redact automatically runs a quasi-identifier check after anonymization. If 2 or more of the following are present in the same message:
- An age reference (
35-year-old,aged 40) - A location (US city name or ZIP code)
- A medical condition (
diabetes,cancer,hypertension, …)
The vault record is flagged with quasi_id_risk: true. The request still goes through, but the flag appears in the dashboard security metrics so you can review the policy or add stricter controls.
This implements the k-anonymity principle: even without a name, a combination of quasi-identifiers can re-identify an individual.