Skip to content

Structured Data Anonymization

Unstructured text anonymization works line by line. Structured data is different: a CSV with a first_name column, an email column, and a notes column needs each column treated according to what it actually contains, not a blanket scan of the whole row.

Structured Data Anonymization parses your file first, shows you the columns with sample values, lets you configure each one individually, then produces a clean anonymized file in the same format.


Supported formats

  • CSV - any delimiter, UTF-8 encoded, with a header row
  • JSON - an array of objects at the top level, where each object is one record

Maximum file size is 10 MB.


How to use it (dashboard)

Step 1: Upload your file

Open the Structured Data tab in your Redact workspace. Drop a CSV or JSON file onto the upload zone or click to browse. The file stays in memory and is not submitted until you click Anonymize.

Step 2: Configure columns

After upload, click Detect Columns. Xybern parses the file and returns each column name alongside up to three sample values so you can see what is in it.

For each column, choose what to do:

Option Meaning
Skip Leave this column unchanged
Auto Detect entity type from the column name automatically
Person Treat values as person names
Email Treat values as email addresses
Phone Treat values as phone numbers
Organisation Treat values as company or organisation names
SSN Treat values as social security numbers
Credit Card Treat values as credit card numbers
IBAN Treat values as IBAN bank account numbers

Xybern pre-fills a suggestion for each column based on its name. A column called email starts on Email, a column called first_name or last_name starts on Person, a column called phone or mobile starts on Phone, a column called iban or bank_account starts on IBAN. You can override any suggestion before running.

Also pick a Policy and optionally a TTL (days before entity map entries expire). These apply to every column that is not set to Skip.

Step 3: Download the result

Click Anonymize. Xybern processes each row, replaces PII values with pseudonyms using your workspace entity map, and returns the file. The download preserves the original column structure and file format, CSV in, CSV out, JSON in, JSON out.

A count of entities anonymized per column is shown alongside the download button.


API

Both endpoints accept either a session cookie (dashboard) or a Bearer API key (external callers). To authenticate with an API key, include the Authorization header on every request:

Authorization: Bearer <your-api-key>

API keys are provisioned from the API Keys tab in your workspace.


Detect columns

POST /api/redact/{workspace_id}/structured/detect
Authorization: Bearer <your-api-key>
Content-Type: multipart/form-data

file: <binary>

Returns column names with sample values and a suggested entity type for each.

Response

{
  "ok": true,
  "filename": "contacts.csv",
  "format": "csv",
  "row_count": 250,
  "columns": [
    {
      "name": "first_name",
      "samples": ["Sarah", "James", "Mei"],
      "suggested_entity_type": "PERSON"
    },
    {
      "name": "email",
      "samples": ["sarah@acme.com", "j.carter@healthbridge.org"],
      "suggested_entity_type": "EMAIL"
    },
    {
      "name": "company",
      "samples": ["Acme Corp", "HealthBridge Ltd"],
      "suggested_entity_type": "ORG"
    },
    {
      "name": "notes",
      "samples": ["Reviewed contract on 2026-05-10"],
      "suggested_entity_type": "skip"
    }
  ]
}

Anonymize

POST /api/redact/{workspace_id}/structured/anonymize
Authorization: Bearer <your-api-key>
Content-Type: multipart/form-data

file:          <binary>
column_config: <JSON string>
policy_id:     <string>   (optional)
ttl_days:      <integer>  (optional, default 0)

column_config is a JSON array of column objects:

[
  { "name": "first_name",   "entity_type": "PERSON" },
  { "name": "last_name",    "entity_type": "PERSON" },
  { "name": "email",        "entity_type": "EMAIL" },
  { "name": "phone",        "entity_type": "PHONE" },
  { "name": "ssn",          "entity_type": "SSN" },
  { "name": "credit_card",  "entity_type": "CREDITCARD" },
  { "name": "iban",         "entity_type": "IBAN" },
  { "name": "company",      "entity_type": "ORG" },
  { "name": "notes",        "entity_type": "skip" }
]

Response

Returns the anonymized file as a download with Content-Disposition: attachment. The filename is prefixed with anon_ (for example anon_contacts.csv).

Response headers also include:

X-Anonymized-Count: 412
X-Anonymized-Columns: first_name,last_name,email,phone,company

How pseudonyms are assigned

Structured Data Anonymization uses the same entity map as the proxy endpoint. If "Sarah Williams" was previously anonymized as "Alex Turner" in this workspace, it will be replaced with "Alex Turner" in the structured file too. Pseudonyms are consistent across all input methods within a workspace.

Columns set to Skip are passed through exactly as they appear in the source file. Column order is preserved.


Difference from the Documents tab

The Documents tab (PDF, DOCX, TXT, MD) treats the file as plain text and scans for PII across the full content. It is designed for unstructured documents where you do not control the layout.

The Structured Data tab is designed for files where you know the schema. You get per-column control, consistent pseudonym assignment across rows, and a clean downloadable file rather than an anonymized text blob.

Upload CSV files to Structured Data, not to Documents.