Skip to content

Format-Preserving Anonymization

Standard anonymization replaces a value like 123-45-6789 with a token like [SSN-ANON]. That is fine when the output is read by a human or fed back into a prompt. It breaks immediately when the output is parsed by a system that validates field format: an EHR integration, a payment gateway test harness, a staging pipeline that runs schema checks before inserting rows.

Format-Preserving Anonymization solves this. Instead of a token, Xybern generates a structurally valid fake value of the same type:

Original Standard output Format-preserving output
123-45-6789 [SSN-ANON] 482-31-7056
4111 1111 1111 1111 [CARD-REDACTED] 4823 6194 5037 2861
GB29 NWBK 6016 1331 9268 19 [IBAN-ANON] GB74 KPRS 3841 7290 5163 48
+1 (555) 234-7890 (555) 300-0042 +1 (382) 749-5831

No configuration is required. Format-preserving output is the default for SSN, credit card, IBAN, and phone entity types.


SSN

Fake SSNs are generated according to the Social Security Administration's assignment rules:

  • Area number (first three digits): 001 to 899, excluding 000 and 666
  • Group number (middle two digits): 01 to 99, never 00
  • Serial number (last four digits): 0001 to 9999, never 0000

The separator style of the original is preserved. If the original used dashes (123-45-6789), the fake uses dashes. If it used spaces or no separator, the fake matches.


Credit Card Numbers

Fake card numbers are generated to pass Luhn checksum validation, which is the standard check used by payment processors and form validators to detect typos.

The first two digits of the original are preserved, so the card network is recognisable (a Visa starting with 4 stays a 4, a Mastercard in the 51 range stays in 51). The remaining digits are replaced, and the final check digit is recomputed to produce a Luhn-valid number.

Original formatting, spaces and dashes between digit groups, is kept intact.


IBANs

Fake IBANs are generated to pass the ISO 13616 mod-97 check digit validation used by banks and payment systems worldwide.

The country code (first two characters) is preserved, so the fake IBAN remains geographically recognisable. A GB IBAN stays a GB IBAN. The BBAN characters (everything after the two check digits) are replaced, preserving the alphanumeric structure of each position, and the two check digits are recomputed using the mod-97 algorithm to produce a valid IBAN.

Spaced and unspaced formats are both handled. A spaced GB29 NWBK 6016 1331 9268 19 produces a spaced fake in the same four-character grouping. An unspaced GB29NWBK60161331926819 produces an unspaced fake of the same length.


Phone Numbers

The digit count and all non-digit characters (country code prefix, parentheses, spaces, dashes) are preserved exactly. Only the digit positions are replaced.

A UK mobile +44 7911 123456 produces a fake with the same structure and digit count. A North American (212) 555-0198 produces a ten-digit fake in the same format.


Deterministic output

The same real value always produces the same fake within a workspace. If 123-45-6789 maps to 482-31-7056 the first time, it maps to 482-31-7056 every subsequent time in that workspace, even across separate requests. This means de-anonymization works correctly and vault records stay consistent.

A different workspace produces a different fake for the same input. Workspaces are isolated.


Existing entity map entries

Format-preserving generation applies to new entity map entries. Values that were anonymized before this feature was enabled retain their existing pseudonyms. To remap a value to a format-preserving fake, clear the entity map entry for that workspace via the Vault tab or the API and anonymize the value again.


No policy configuration needed

Format-preserving is not a policy option to enable. It is the default behaviour for SSN, credit card, IBAN, and phone number entity types. All existing policies continue to work without changes.