Skip to content

Proxy Endpoint Reference

The Redact proxy exposes a single OpenAI-compatible endpoint. Any SDK or HTTP client that supports a custom base_url works without modification.


Endpoint

POST https://www.xybern.com/redact/v1/chat/completions

Authentication

Pass your Redact API key in the Authorization header:

Authorization: Bearer xr_live_YOUR_KEY

The key identifies your workspace and determines which policy is applied. Keys are created in the Redact dashboard under API Keys.


Request Body

The request body is a standard OpenAI chat completions payload:

{
  "model": "claude-sonnet-4-6",
  "messages": [
    {
      "role": "user",
      "content": "Summarise the NDA signed by Michael Chen at Goldman Sachs."
    }
  ],
  "max_tokens": 1024,
  "temperature": 0.7
}
Field Type Required Description
model string Yes Model identifier. Must be supported by your configured upstream provider.
messages array Yes Array of {role, content} objects. role is user, assistant, or system.
max_tokens integer No Maximum tokens in the response. Default: 4096.
temperature float No Sampling temperature. Passed through to the upstream provider.
system string No System prompt (Anthropic-style). Passed through directly, not anonymized.
stream boolean No If true, the response is returned as a Server-Sent Events stream.

System prompts are not anonymized

The system field is forwarded to the upstream provider as-is. Do not include PII in system prompts if you need it anonymized. Put sensitive content in the messages array instead.


Response

The response is an OpenAI-format chat completion object with de-anonymized content:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "object": "chat.completion",
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The NDA signed by Michael Chen (Goldman Sachs) on March 12 2024 covers..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 142,
    "completion_tokens": 89,
    "total_tokens": 231
  }
}

The real names and values are restored in choices[0].message.content before the response reaches your application. The LLM operated on pseudonyms throughout.


Streaming

Set "stream": true to receive the response as a Server-Sent Events stream, compatible with the OpenAI streaming format:

import openai

client = openai.OpenAI(
    base_url="https://www.xybern.com/redact/v1",
    api_key="xr_live_YOUR_KEY",
)

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Summarise the NDA signed by Michael Chen."}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Each chunk is a chat.completion.chunk object:

data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{"content":"The NDA signed by Michael Chen"},"finish_reason":null}]}

data: {"id":"redact-a1b2c3","object":"chat.completion.chunk","model":"claude-sonnet-4-6","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Anonymization still runs before streaming begins

Redact buffers the full upstream response to run leakage detection and de-anonymization before streaming begins. The stream starts only after the full response has been processed. This means PII cannot leak mid-stream, but time-to-first-token is the full response latency rather than the upstream provider's TTFT.


Error Responses

HTTP Status type Cause
401 auth_error Missing or invalid Authorization header
401 auth_error API key is revoked or inactive
403 permission_denied API key is not scoped for the request's doc_class
502 server_error Upstream LLM returned an error (provider-side issue)
{
  "error": {
    "message": "API key not scoped for doc_class 'healthcare'. Allowed: finance, legal",
    "type": "permission_denied"
  }
}

How Anonymization Applies

When the request arrives:

  1. Policy resolution, the default policy for the workspace is applied, or the policy matching the API key's doc_class scope.
  2. Entity detection, each messages[].content string is scanned for PII based on the active policy toggles.
  3. Pseudonym assignment, each detected entity is replaced with a consistent workspace-scoped pseudonym. The same real value always maps to the same pseudonym within a workspace.
  4. Upstream call, the anonymized messages are forwarded to your configured LLM provider.
  5. Leakage scan, the LLM response is scanned for any real values that leaked back. Found values are re-anonymized.
  6. De-anonymization, pseudonyms in the response are replaced with original values before returning to your agent.
  7. Vault logging, the interaction is recorded with SHA-256 chain hash and HMAC signature.

Multi-Turn Conversations

Pseudonym mappings persist across requests within the same workspace. If you send multiple requests referencing the same person, they will consistently receive the same pseudonym:

# Request 1
"Michael Chen signed the NDA"
# → "Finley Warren signed the NDA"

# Request 2 (same workspace, different request)
"What did Michael Chen agree to?"
# → "What did Finley Warren agree to?"
# Same pseudonym - the LLM can maintain context coherently

Health Check

GET https://www.xybern.com/redact/health
{
  "ok": true,
  "service": "xybern-redact",
  "version": "4.0"
}