Proxy Endpoint Reference¶
The Redact proxy exposes a single OpenAI-compatible endpoint. Any SDK or HTTP client that supports a custom base_url works without modification.
Endpoint¶
Authentication¶
Pass your Redact API key in the Authorization header:
The key identifies your workspace and determines which policy is applied. Keys are created in the Redact dashboard under API Keys.
Request Body¶
The request body is a standard OpenAI chat completions payload:
{
"model": "claude-sonnet-4-6",
"messages": [
{
"role": "user",
"content": "Summarise the NDA signed by Michael Chen at Goldman Sachs."
}
],
"max_tokens": 1024,
"temperature": 0.7
}
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier. Must be supported by your configured upstream provider. |
messages |
array | Yes | Array of {role, content} objects. role is user, assistant, or system. |
max_tokens |
integer | No | Maximum tokens in the response. Default: 4096. |
temperature |
float | No | Sampling temperature. Passed through to the upstream provider. |
system |
string | No | System prompt (Anthropic-style). Passed through directly, not anonymized. |
stream |
boolean | No | If true, the response is returned as a Server-Sent Events stream. |
System prompts are not anonymized
The system field is forwarded to the upstream provider as-is. Do not include PII in system prompts if you need it anonymized. Put sensitive content in the messages array instead.
Response¶
The response is an OpenAI-format chat completion object with de-anonymized content:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"object": "chat.completion",
"model": "claude-sonnet-4-6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The NDA signed by Michael Chen (Goldman Sachs) on March 12 2024 covers..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 142,
"completion_tokens": 89,
"total_tokens": 231
}
}
The real names and values are restored in choices[0].message.content before the response reaches your application. The LLM operated on pseudonyms throughout.
Streaming¶
Set "stream": true to receive the response as a Server-Sent Events stream, compatible with the OpenAI streaming format:
import openai
client = openai.OpenAI(
base_url="https://www.xybern.com/redact/v1",
api_key="xr_live_YOUR_KEY",
)
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Summarise the NDA signed by Michael Chen."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Each chunk is a chat.completion.chunk object:
data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"The NDA signed by Michael Chen"},"finish_reason":null}]}
data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Anonymization still runs before streaming begins
Redact buffers the full upstream response to run leakage detection and de-anonymization before streaming begins. The stream starts only after the full response has been processed. This means PII cannot leak mid-stream, but time-to-first-token is the full response latency rather than the upstream provider's TTFT.
Error Responses¶
| HTTP Status | type |
Cause |
|---|---|---|
401 |
auth_error |
Missing or invalid Authorization header |
401 |
auth_error |
API key is revoked or inactive |
403 |
permission_denied |
API key is not scoped for the request's doc_class |
502 |
server_error |
Upstream LLM returned an error (provider-side issue) |
{
"error": {
"message": "API key not scoped for doc_class 'healthcare'. Allowed: finance, legal",
"type": "permission_denied"
}
}
How Anonymization Applies¶
When the request arrives:
- Policy resolution, the default policy for the workspace is applied, or the policy matching the API key's
doc_classscope. - Entity detection, each
messages[].contentstring is scanned for PII based on the active policy toggles. - Pseudonym assignment, each detected entity is replaced with a consistent workspace-scoped pseudonym. The same real value always maps to the same pseudonym within a workspace.
- Upstream call, the anonymized messages are forwarded to your configured LLM provider.
- Leakage scan, the LLM response is scanned for any real values that leaked back. Found values are re-anonymized.
- De-anonymization, pseudonyms in the response are replaced with original values before returning to your agent.
- Vault logging, the interaction is recorded with SHA-256 chain hash and HMAC signature.
Multi-Turn Conversations¶
Pseudonym mappings persist across requests within the same workspace. If you send multiple requests referencing the same person, they will consistently receive the same pseudonym:
# Request 1
"Michael Chen signed the NDA"
# → "Finley Warren signed the NDA"
# Request 2 (same workspace, different request)
"What did Michael Chen agree to?"
# → "What did Finley Warren agree to?"
# Same pseudonym - the LLM can maintain context coherently