Integration Guide¶

This guide covers wiring Xybern Redact into production AI agent stacks. The proxy is OpenAI-compatible, any SDK or framework that supports a custom base_url works without code changes.

The One-Line Change¶

Every integration reduces to this: change the endpoint your LLM client points to and set the authorization header.

# Before
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# After
client = OpenAI(
    base_url="https://www.xybern.com/redact/v1",
    api_key=os.environ["REDACT_API_KEY"],
)

The REDACT_API_KEY is your xr_live_… key from the Redact dashboard. Your OPENAI_API_KEY (or Anthropic, Gemini, etc. key) is configured once in the workspace Settings, agents never need to hold it.

Environment Variable Pattern¶

# .env
REDACT_API_KEY=xr_live_YOUR_KEY
REDACT_BASE_URL=https://www.xybern.com/redact/v1

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["REDACT_BASE_URL"],
    api_key=os.environ["REDACT_API_KEY"],
)

This pattern works with any framework, swap in the Redact URL and key and the rest of your code is unchanged.

Framework Integrations¶

LangChain¶

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    openai_api_base="https://www.xybern.com/redact/v1",
    openai_api_key="xr_live_YOUR_KEY",
)

# Use exactly as before - redaction is transparent
chain = prompt | llm | StrOutputParser()

LlamaIndex¶

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_base="https://www.xybern.com/redact/v1",
    api_key="xr_live_YOUR_KEY",
)

CrewAI¶

from crewai import Agent
from langchain_openai import ChatOpenAI

redact_llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    openai_api_base="https://www.xybern.com/redact/v1",
    openai_api_key="xr_live_YOUR_KEY",
)

agent = Agent(
    role="Contract Analyst",
    goal="Review and summarise NDAs",
    llm=redact_llm,
)

AutoGen¶

config_list = [{
    "model": "claude-sonnet-4-6",
    "api_key": "xr_live_YOUR_KEY",
    "base_url": "https://www.xybern.com/redact/v1",
    "api_type": "openai",
}]

Direct HTTP (curl)¶

curl -X POST https://www.xybern.com/redact/v1/chat/completions \
  -H "Authorization: Bearer xr_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Summarise this NDA..."}]
  }'

Per-Agent API Keys¶

For production deployments with multiple agents, create one API key per agent. This gives you:

Attribution: every vault record shows which agent made the request (api_key_id)
Least-privilege: each key is scoped to the doc classes that agent actually processes
Revocation: if one agent is compromised, revoke only its key

Contract Review Agent   → xr_live_key1 (scoped to: legal)
Medical Summarizer      → xr_live_key2 (scoped to: healthcare)
Financial Analyst       → xr_live_key3 (scoped to: finance)
General Assistant       → xr_live_key4 (unrestricted)

Handling the Response¶

The response from Redact is standard OpenAI format with de-anonymized content. You do not need to do anything special to handle it:

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Who signed the NDA?"}],
)

# Real names are restored - you see "Michael Chen", not "Finley Warren"
print(response.choices[0].message.content)

If the request fails (upstream provider error), you receive a standard OpenAI error response:

{
  "error": {
    "message": "Upstream provider error",
    "type": "server_error"
  }
}

Implement standard retry logic as you would for any LLM provider.

Streaming¶

Streaming responses (stream: true) are not currently supported. The proxy collects the full response to run the leakage scan before returning. Streaming support is on the roadmap.

Rate Limits and Timeouts¶

The Redact proxy has a default upstream timeout of 120 seconds. For long-running requests (large documents, complex reasoning), consider breaking the document into smaller chunks.

There is no proxy-level rate limit beyond what your upstream provider enforces.

Testing Your Integration¶

Send a test request with known PII and verify:

The vault shows a new record with entities_stripped_count > 0
The response contains the original names (de-anonymized), not the pseudonyms
The HMAC column shows ✓ (if you have an HMAC key configured)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{
        "role": "user",
        "content": "My name is John Smith and I work at Acme Corp. What is 2+2?"
    }]
)

# The response should say "4", addressing you naturally
# The vault should show 2 entities stripped (PERSON, ORG)
assert "John Smith" in response.choices[0].message.content or "4" in response.choices[0].message.content