Skip to content

Integration Guide

This guide covers wiring Xybern Redact into production AI agent stacks. The proxy is OpenAI-compatible, any SDK or framework that supports a custom base_url works without code changes.


The One-Line Change

Every integration reduces to this: change the endpoint your LLM client points to and set the authorization header.

# Before
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# After
client = OpenAI(
    base_url="https://www.xybern.com/redact/v1",
    api_key=os.environ["REDACT_API_KEY"],
)

The REDACT_API_KEY is your xr_live_… key from the Redact dashboard. Your OPENAI_API_KEY (or Anthropic, Gemini, etc. key) is configured once in the workspace Settings, agents never need to hold it.


Environment Variable Pattern

# .env
REDACT_API_KEY=xr_live_YOUR_KEY
REDACT_BASE_URL=https://www.xybern.com/redact/v1
import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ["REDACT_BASE_URL"],
    api_key=os.environ["REDACT_API_KEY"],
)

This pattern works with any framework, swap in the Redact URL and key and the rest of your code is unchanged.


Framework Integrations

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    openai_api_base="https://www.xybern.com/redact/v1",
    openai_api_key="xr_live_YOUR_KEY",
)

# Use exactly as before - redaction is transparent
chain = prompt | llm | StrOutputParser()

LlamaIndex

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_base="https://www.xybern.com/redact/v1",
    api_key="xr_live_YOUR_KEY",
)

CrewAI

from crewai import Agent
from langchain_openai import ChatOpenAI

redact_llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    openai_api_base="https://www.xybern.com/redact/v1",
    openai_api_key="xr_live_YOUR_KEY",
)

agent = Agent(
    role="Contract Analyst",
    goal="Review and summarise NDAs",
    llm=redact_llm,
)

AutoGen

config_list = [{
    "model": "claude-sonnet-4-6",
    "api_key": "xr_live_YOUR_KEY",
    "base_url": "https://www.xybern.com/redact/v1",
    "api_type": "openai",
}]

Direct HTTP (curl)

curl -X POST https://www.xybern.com/redact/v1/chat/completions \
  -H "Authorization: Bearer xr_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Summarise this NDA..."}]
  }'

Per-Agent API Keys

For production deployments with multiple agents, create one API key per agent. This gives you:

  • Attribution: every vault record shows which agent made the request (api_key_id)
  • Least-privilege: each key is scoped to the doc classes that agent actually processes
  • Revocation: if one agent is compromised, revoke only its key
Contract Review Agent   → xr_live_key1 (scoped to: legal)
Medical Summarizer      → xr_live_key2 (scoped to: healthcare)
Financial Analyst       → xr_live_key3 (scoped to: finance)
General Assistant       → xr_live_key4 (unrestricted)

Handling the Response

The response from Redact is standard OpenAI format with de-anonymized content. You do not need to do anything special to handle it:

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Who signed the NDA?"}],
)

# Real names are restored - you see "Michael Chen", not "Finley Warren"
print(response.choices[0].message.content)

If the request fails (upstream provider error), you receive a standard OpenAI error response:

{
  "error": {
    "message": "Upstream provider error",
    "type": "server_error"
  }
}

Implement standard retry logic as you would for any LLM provider.


Streaming

Streaming responses (stream: true) are not currently supported. The proxy collects the full response to run the leakage scan before returning. Streaming support is on the roadmap.


Rate Limits and Timeouts

The Redact proxy has a default upstream timeout of 120 seconds. For long-running requests (large documents, complex reasoning), consider breaking the document into smaller chunks.

There is no proxy-level rate limit beyond what your upstream provider enforces.


Testing Your Integration

Send a test request with known PII and verify:

  1. The vault shows a new record with entities_stripped_count > 0
  2. The response contains the original names (de-anonymized), not the pseudonyms
  3. The HMAC column shows ✓ (if you have an HMAC key configured)
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{
        "role": "user",
        "content": "My name is John Smith and I work at Acme Corp. What is 2+2?"
    }]
)

# The response should say "4", addressing you naturally
# The vault should show 2 entities stripped (PERSON, ORG)
assert "John Smith" in response.choices[0].message.content or "4" in response.choices[0].message.content