Proxy Endpoint Reference¶

The Redact proxy exposes a single OpenAI-compatible endpoint. Any SDK or HTTP client that supports a custom base_url works without modification.

Endpoint¶

POST https://www.xybern.com/redact/v1/chat/completions

Authentication¶

Pass your Redact API key in the Authorization header:

Authorization: Bearer xr_live_YOUR_KEY

The key identifies your workspace and determines which policy is applied. Keys are created in the Redact dashboard under API Keys.

Request Body¶

The request body is a standard OpenAI chat completions payload:

{
  "model": "claude-sonnet-4-6",
  "messages": [
    {
      "role": "user",
      "content": "Summarise the NDA signed by Michael Chen at Goldman Sachs."
    }
  ],
  "max_tokens": 1024,
  "temperature": 0.7
}

Field	Type	Required	Description
`model`	string	Yes	Model identifier. Must be supported by your configured upstream provider.
`messages`	array	Yes	Array of `{role, content}` objects. `role` is `user`, `assistant`, or `system`.
`max_tokens`	integer	No	Maximum tokens in the response. Default: 4096.
`temperature`	float	No	Sampling temperature. Passed through to the upstream provider.
`system`	string	No	System prompt (Anthropic-style). Passed through directly, not anonymized.
`stream`	boolean	No	If `true`, the response is returned as a Server-Sent Events stream.

System prompts are not anonymized

The system field is forwarded to the upstream provider as-is. Do not include PII in system prompts if you need it anonymized. Put sensitive content in the messages array instead.

Response¶

The response is an OpenAI-format chat completion object with de-anonymized content:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "object": "chat.completion",
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The NDA signed by Michael Chen (Goldman Sachs) on March 12 2024 covers..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 142,
    "completion_tokens": 89,
    "total_tokens": 231
  }
}

The real names and values are restored in choices[0].message.content before the response reaches your application. The LLM operated on pseudonyms throughout.

Streaming¶

Set "stream": true to receive the response as a Server-Sent Events stream, compatible with the OpenAI streaming format:

import openai

client = openai.OpenAI(
    base_url="https://www.xybern.com/redact/v1",
    api_key="xr_live_YOUR_KEY",
)

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Summarise the NDA signed by Michael Chen."}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Each chunk is a chat.completion.chunk object:

data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"The NDA signed by Michael Chen"},"finish_reason":null}]}

data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Anonymization still runs before streaming begins

Redact buffers the full upstream response to run leakage detection and de-anonymization before streaming begins. The stream starts only after the full response has been processed. This means PII cannot leak mid-stream, but time-to-first-token is the full response latency rather than the upstream provider's TTFT.

Error Responses¶

HTTP Status	`type`	Cause
`401`	`auth_error`	Missing or invalid `Authorization` header
`401`	`auth_error`	API key is revoked or inactive
`403`	`permission_denied`	API key is not scoped for the request's `doc_class`
`502`	`server_error`	Upstream LLM returned an error (provider-side issue)

{
  "error": {
    "message": "API key not scoped for doc_class 'healthcare'. Allowed: finance, legal",
    "type": "permission_denied"
  }
}

How Anonymization Applies¶

When the request arrives:

Policy resolution, the default policy for the workspace is applied, or the policy matching the API key's doc_class scope.
Entity detection, each messages[].content string is scanned for PII based on the active policy toggles.
Pseudonym assignment, each detected entity is replaced with a consistent workspace-scoped pseudonym. The same real value always maps to the same pseudonym within a workspace.
Upstream call, the anonymized messages are forwarded to your configured LLM provider.
Leakage scan, the LLM response is scanned for any real values that leaked back. Found values are re-anonymized.
De-anonymization, pseudonyms in the response are replaced with original values before returning to your agent.
Vault logging, the interaction is recorded with SHA-256 chain hash and HMAC signature.

Multi-Turn Conversations¶

Pseudonym mappings persist across requests within the same workspace. If you send multiple requests referencing the same person, they will consistently receive the same pseudonym:

# Request 1
"Michael Chen signed the NDA"
# → "Finley Warren signed the NDA"

# Request 2 (same workspace, different request)
"What did Michael Chen agree to?"
# → "What did Finley Warren agree to?"
# Same pseudonym - the LLM can maintain context coherently

Health Check¶

GET https://www.xybern.com/redact/health

{
  "ok": true,
  "service": "xybern-redact",
  "version": "4.0"
}