Structured Data Anonymization¶

Unstructured text anonymization works line by line. Structured data is different: a CSV with a first_name column, an email column, and a notes column needs each column treated according to what it actually contains, not a blanket scan of the whole row.

Structured Data Anonymization parses your file first, shows you the columns with sample values, lets you configure each one individually, then produces a clean anonymized file in the same format.

Supported formats¶

CSV - any delimiter, UTF-8 encoded, with a header row
JSON - an array of objects at the top level, where each object is one record

Maximum file size is 10 MB.

How to use it (dashboard)¶

Step 1: Upload your file¶

Open the Structured Data tab in your Redact workspace. Drop a CSV or JSON file onto the upload zone or click to browse. The file stays in memory and is not submitted until you click Anonymize.

Step 2: Configure columns¶

After upload, click Detect Columns. Xybern parses the file and returns each column name alongside up to three sample values so you can see what is in it.

For each column, choose what to do:

Option	Meaning
Skip	Leave this column unchanged
Auto	Detect entity type from the column name automatically
Person	Treat values as person names
Email	Treat values as email addresses
Phone	Treat values as phone numbers
Organisation	Treat values as company or organisation names
SSN	Treat values as social security numbers
Credit Card	Treat values as credit card numbers
IBAN	Treat values as IBAN bank account numbers

Xybern pre-fills a suggestion for each column based on its name. A column called email starts on Email, a column called first_name or last_name starts on Person, a column called phone or mobile starts on Phone, a column called iban or bank_account starts on IBAN. You can override any suggestion before running.

Also pick a Policy and optionally a TTL (days before entity map entries expire). These apply to every column that is not set to Skip.

Step 3: Download the result¶

Click Anonymize. Xybern processes each row, replaces PII values with pseudonyms using your workspace entity map, and returns the file. The download preserves the original column structure and file format, CSV in, CSV out, JSON in, JSON out.

A count of entities anonymized per column is shown alongside the download button.

API¶

Both endpoints accept either a session cookie (dashboard) or a Bearer API key (external callers). To authenticate with an API key, include the Authorization header on every request:

Authorization: Bearer <your-api-key>

API keys are provisioned from the API Keys tab in your workspace.

Detect columns¶

POST /api/redact/{workspace_id}/structured/detect
Authorization: Bearer <your-api-key>
Content-Type: multipart/form-data

file: <binary>

Returns column names with sample values and a suggested entity type for each.

Response¶

{
  "ok": true,
  "filename": "contacts.csv",
  "format": "csv",
  "row_count": 250,
  "columns": [
    {
      "name": "first_name",
      "samples": ["Sarah", "James", "Mei"],
      "suggested_entity_type": "PERSON"
    },
    {
      "name": "email",
      "samples": ["sarah@acme.com", "j.carter@healthbridge.org"],
      "suggested_entity_type": "EMAIL"
    },
    {
      "name": "company",
      "samples": ["Acme Corp", "HealthBridge Ltd"],
      "suggested_entity_type": "ORG"
    },
    {
      "name": "notes",
      "samples": ["Reviewed contract on 2026-05-10"],
      "suggested_entity_type": "skip"
    }
  ]
}

Anonymize¶

POST /api/redact/{workspace_id}/structured/anonymize
Authorization: Bearer <your-api-key>
Content-Type: multipart/form-data

file:          <binary>
column_config: <JSON string>
policy_id:     <string>   (optional)
ttl_days:      <integer>  (optional, default 0)

column_config is a JSON array of column objects:

[
  { "name": "first_name",   "entity_type": "PERSON" },
  { "name": "last_name",    "entity_type": "PERSON" },
  { "name": "email",        "entity_type": "EMAIL" },
  { "name": "phone",        "entity_type": "PHONE" },
  { "name": "ssn",          "entity_type": "SSN" },
  { "name": "credit_card",  "entity_type": "CREDITCARD" },
  { "name": "iban",         "entity_type": "IBAN" },
  { "name": "company",      "entity_type": "ORG" },
  { "name": "notes",        "entity_type": "skip" }
]

Response¶

Returns the anonymized file as a download with Content-Disposition: attachment. The filename is prefixed with anon_ (for example anon_contacts.csv).

Response headers also include:

X-Anonymized-Count: 412
X-Anonymized-Columns: first_name,last_name,email,phone,company

How pseudonyms are assigned¶

Structured Data Anonymization uses the same entity map as the proxy endpoint. If "Sarah Williams" was previously anonymized as "Alex Turner" in this workspace, it will be replaced with "Alex Turner" in the structured file too. Pseudonyms are consistent across all input methods within a workspace.

Columns set to Skip are passed through exactly as they appear in the source file. Column order is preserved.

Difference from the Documents tab¶

The Documents tab (PDF, DOCX, TXT, MD) treats the file as plain text and scans for PII across the full content. It is designed for unstructured documents where you do not control the layout.

The Structured Data tab is designed for files where you know the schema. You get per-column control, consistent pseudonym assignment across rows, and a clean downloadable file rather than an anonymized text blob.

Upload CSV files to Structured Data, not to Documents.

Proxy Endpoint - anonymize text inline before it reaches the LLM
Policies - configure which entity types are anonymized
Vault and Audit Trail - how anonymization events are logged
Anonymization Preview - test a policy against sample text without writing to the vault

Structured Data Anonymization¶

Supported formats¶

How to use it (dashboard)¶

Step 1: Upload your file¶

Step 2: Configure columns¶

Step 3: Download the result¶

API¶

Detect columns¶

Response¶

Anonymize¶

Response¶

How pseudonyms are assigned¶

Difference from the Documents tab¶

Related¶