Skip to content

Policies

A policy defines which entity types get anonymized for a given class of document. You can create multiple policies per workspace, one for legal documents, one for healthcare, one for financial data, and the right policy is applied automatically based on the API key's scope or the request's document class.


Document Classes

Every policy targets a document class:

Class Use Case
general Default. Mixed or unclassified content.
legal Contracts, NDAs, court filings, legal correspondence.
healthcare Patient records, clinical notes, insurance documents. Enables k-anonymity guard.
finance Financial statements, trading records, investment documents.

When your API key is scoped to a doc class (e.g. finance), only requests tagged with that class are allowed through. Requests with a different doc class are blocked with a 403.


Policy Toggles

Each policy has the following toggles:

Strip Person Names

Detects and replaces individual names using two methods:

  • Title-prefix detection (high confidence): Mr. John Smith, Dr. Sarah Williams, CEO Michael Chen, the title anchors the detection
  • Known first-name detection (medium confidence): Michael Chen, Sarah Williams, matched against a curated list of common first names

Replacement: consistent workspace-scoped pseudonym (e.g. Finley Warren).

Strip Organisation Names

Detects company names with legal suffixes:

Goldman Sachs Group Inc., Apex Holdings LLC, St. Mary's Hospital

Replacement: consistent pseudonym (e.g. Cobalt Group).

Strip Emails & Phones

  • Emails: michael@goldman.commorgan@corp-anon.org
  • Phones: +1 (212) 555-0100(555) 141-0033

SSNs and credit card numbers are always stripped regardless of this toggle.

Shift Dates

All detected dates are offset by a fixed number of days (configurable, default 30):

March 12, 2024April 11, 2024

The same offset applies consistently across the document, preserving relative time relationships.

Supports formats: Month DD, YYYY, DD Month YYYY, YYYY-MM-DD, MM/DD/YYYY.

Scale Financial Figures

Monetary amounts are multiplied by a configurable scale factor:

$2.4 million$1.2 million (with scale factor 0.5)

Supported currencies: USD, GBP, EUR, and plain numeric amounts with million, billion, thousand suffixes.

Anonymize Signatures

Detects and redacts signature indicators in legal and formal documents:

Pattern Input Output
Electronic signature /s/ Michael Chen /s/ Finley Warren
Sign-off block Signed by: Sarah Williams, General Counsel Signed by: Riley Carter, General Counsel
Blank signature line Signature: ________________________ [SIGNATURE LINE REDACTED]
Digital certificate CN=Michael Chen, OU=Finance CN=Finley Warren, OU=Finance
Notary block Notarized by: Robert Johnson, Notary Public Notarized by: Reese Chen, Notary Public
Placeholder [SIGNATURE BLOCK], [SIG] [SIGNATURE REDACTED]

Enable for legal documents

Signature anonymization is most important for contracts, court filings, and notarised documents where the signatory identity is itself PII. Enable it on your legal doc class policy.

Permanent Redaction Mode

When enabled, the de-anonymization step is skipped entirely. Pseudonyms remain in the LLM response permanently, real values are never restored before the response reaches your application.

Use this when:

  • Storing LLM summaries in a database where real PII must never be written
  • Generating training data or synthetic documents
  • Producing outputs that will be reviewed by third parties who must not see real PII
{
  "name": "Training Data Policy",
  "permanent_redact": true,
  "strip_persons": true,
  "strip_orgs": true,
  "strip_emails": true
}

With permanent redaction, choices[0].message.content will contain pseudonyms (Finley Warren, Cobalt Group) rather than the original names. The entity map is still stored in the vault, so you can de-anonymize manually using the API if needed.

Set as Default Policy

When checked, this policy is applied to all requests that don't match a more specific policy. Only one policy per workspace can be the default.


Custom Entity Types

Define your own regex-based entity patterns that are applied in addition to the built-in types. Useful for domain-specific identifiers that the built-in detector does not cover:

  • Employee IDs: EMP-\d{6}
  • Internal project codenames: PROJ-(ALPHA|BETA|GAMMA)-\d{3}
  • Custom account numbers: ACC-[A-Z]{2}\d{8}
  • Medical record numbers: MRN-\d{7}

Each custom entity has three fields:

Field Description
name Label shown in vault stats (e.g. EMPLOYEE_ID)
pattern Python-compatible regex applied to the message text
replacement Literal string to substitute on match. Default: [CUSTOM-REDACTED]

Custom entities are applied after all built-in entity detection, in the order they are defined.

Via API

curl -X POST https://www.xybern.com/api/redact/{workspace_id}/policies \
  -H "Content-Type: application/json" \
  -d '{
    "name": "HR Document Policy",
    "doc_class": "general",
    "strip_persons": true,
    "custom_entities": [
      {
        "name": "employee_id",
        "pattern": "EMP-\\d{6}",
        "replacement": "[EMP-ID-REDACTED]"
      },
      {
        "name": "project_code",
        "pattern": "PROJ-(ALPHA|BETA|GAMMA)-\\d{3}",
        "replacement": "[PROJECT-CODE]"
      }
    ]
  }'

Via Dashboard

In the Policies tab, open or create a policy and scroll to Custom Entity Types. Add rows with the entity name, regex pattern, and replacement string. Patterns are validated server-side on save.

Regex validation

Invalid regex patterns are silently skipped at runtime. Test your pattern against sample input before deploying to production.


Creating a Policy

Via the Redact dashboard:

  1. Go to Policies tab
  2. Click New Policy
  3. Set a name, document class, and toggle the entity types you want anonymized
  4. Optionally set the date offset (days) and financial scale factor
  5. Check Set as default if this should be the fallback for all requests
  6. Click Save Policy

Via API (using your session cookie or admin access):

curl -X POST https://www.xybern.com/api/redact/{workspace_id}/policies \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Legal Privilege Policy",
    "doc_class": "legal",
    "strip_persons": true,
    "strip_orgs": true,
    "strip_emails": true,
    "strip_phones": true,
    "strip_dates": true,
    "strip_signatures": true,
    "strip_financials": false,
    "date_offset_days": 45,
    "permanent_redact": false,
    "custom_entities": [],
    "is_default": true
  }'

Policy Resolution Order

When a request arrives:

  1. If the API key has allowed_doc_classes set, only policies matching those classes are considered
  2. The policy with is_default: true is used if no more specific match exists
  3. If no policy exists at all, a built-in default is applied (persons, orgs, emails, phones stripped, no dates, financials, or signatures)

Healthcare K-Anonymity Guard

For policies with doc_class: healthcare, Redact automatically runs a quasi-identifier check after anonymization. If 2 or more of the following are present in the same message:

  • An age reference (35-year-old, aged 40)
  • A location (US city name or ZIP code)
  • A medical condition (diabetes, cancer, hypertension, …)

The vault record is flagged with quasi_id_risk: true. The request still goes through, but the flag appears in the dashboard security metrics so you can review the policy or add stricter controls.

This implements the k-anonymity principle: even without a name, a combination of quasi-identifiers can re-identify an individual.