Self-Hosted Relay¶
Run Xybern's enforcement plane inside your own network. The relay authorises AI actions locally, action content never leaves your infrastructure, while still pulling policies from, and writing a complete audit trail to, the Xybern control plane.
It's a drop-in: point the Python SDK at the relay and everything else (agent registration, policy CRUD, device login) is transparently proxied upstream. Only intercepts are served locally.
Why run a relay?¶
| Cloud (default) | Self-hosted relay | |
|---|---|---|
| Data residency | Action content sent to Xybern (or hashed with redact) |
Content never leaves your network, only metadata audit records are forwarded |
| Latency | One round-trip per action | Local, in-process evaluation |
| Availability | Needs the cloud reachable | Keeps enforcing from cached policies if the cloud is briefly unreachable |
| Audit | Provenance Vault | The same tamper-evident Provenance Vault, forwarded asynchronously |
Architecture¶
your agent ──▶ Xybern SDK ──▶ Xybern Relay ──▶ (audit only) ──▶ Xybern Cloud
│ ▲ │
local policy eval│ │ policy cache (refreshed) ◀────┘
▼ │
decision (allow / escalate / block)
- Evaluated locally:
action_type,threshold,content_pattern,temporal, andsequence(velocity + ordered). The deterministic and stateful policy types run entirely on-prem. - Forwarded to the cloud: policy types that need it, currently
semantic(an LLM intent judge). Only the matching intercept is forwarded (XYBERN_CLOUD_POLICY_MODE=forward), or you canskipthe cloud entirely. - Policy cache: pulled from
GET /v1/enforce/policieseveryXYBERN_POLICY_REFRESHseconds, with last-known-good fallback. - Audit: every decision is forwarded asynchronously to
POST /v1/enforce/relay/audit, which writes it into the Provenance Vault hash chain, so relay decisions appear in your dashboard alongside cloud ones.
Run it (Docker)¶
export XYBERN_API_KEY=xb_live_... # your workspace key
docker compose up -d # relay on http://localhost:8787
Or directly:
pip install -r requirements.txt
export XYBERN_API_KEY=xb_live_...
gunicorn --bind 0.0.0.0:8787 --workers 1 --threads 8 xybern_relay.app:app
Point the SDK at the relay¶
Configuration¶
| Env var | Default | Purpose |
|---|---|---|
XYBERN_API_KEY |
(required) | Workspace key, pulls policies, forwards audit |
XYBERN_UPSTREAM_URL |
https://www.xybern.com/api/v1 |
Control plane URL |
XYBERN_POLICY_REFRESH |
60 |
Policy cache refresh interval (seconds) |
XYBERN_FAIL_OPEN |
true |
If upstream unreachable: allow (true) or block (false) |
XYBERN_FORWARD_AUDIT |
true |
Forward an audit record of every decision |
XYBERN_CLOUD_POLICY_MODE |
forward |
Cloud-only policies: forward the intercept, or skip |
XYBERN_RELAY_PORT |
8787 |
Listen port |
Endpoints¶
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/enforce/intercept |
Authorise one action (local) |
GET |
/v1/status |
Relay + policy-cache + audit-queue status |
GET |
/healthz |
Liveness / readiness |
* |
/v1/enforce/*, /v1/auth/* |
Transparently proxied upstream |
Scaling note¶
The per-agent action log used by sequence policies is in-process. Run a
single relay (with threads) for a coherent stateful view. The Docker image
runs one gunicorn worker with 8 threads for exactly this reason; scale
horizontally only if you don't rely on sequence/velocity policies, or front them
with a shared store.