Skip to content

Self-Hosted Relay

Run Xybern's enforcement plane inside your own network. The relay authorises AI actions locally, action content never leaves your infrastructure, while still pulling policies from, and writing a complete audit trail to, the Xybern control plane.

It's a drop-in: point the Python SDK at the relay and everything else (agent registration, policy CRUD, device login) is transparently proxied upstream. Only intercepts are served locally.

Why run a relay?

Cloud (default) Self-hosted relay
Data residency Action content sent to Xybern (or hashed with redact) Content never leaves your network, only metadata audit records are forwarded
Latency One round-trip per action Local, in-process evaluation
Availability Needs the cloud reachable Keeps enforcing from cached policies if the cloud is briefly unreachable
Audit Provenance Vault The same tamper-evident Provenance Vault, forwarded asynchronously

Architecture

   your agent ──▶ Xybern SDK ──▶  Xybern Relay  ──▶ (audit only) ──▶ Xybern Cloud
                                  │  ▲                                  │
                  local policy eval│  │  policy cache (refreshed)  ◀────┘
                                  ▼  │
                          decision (allow / escalate / block)
  • Evaluated locally: action_type, threshold, content_pattern, temporal, and sequence (velocity + ordered). The deterministic and stateful policy types run entirely on-prem.
  • Forwarded to the cloud: policy types that need it, currently semantic (an LLM intent judge). Only the matching intercept is forwarded (XYBERN_CLOUD_POLICY_MODE=forward), or you can skip the cloud entirely.
  • Policy cache: pulled from GET /v1/enforce/policies every XYBERN_POLICY_REFRESH seconds, with last-known-good fallback.
  • Audit: every decision is forwarded asynchronously to POST /v1/enforce/relay/audit, which writes it into the Provenance Vault hash chain, so relay decisions appear in your dashboard alongside cloud ones.

Run it (Docker)

export XYBERN_API_KEY=xb_live_...        # your workspace key
docker compose up -d                     # relay on http://localhost:8787

Or directly:

pip install -r requirements.txt
export XYBERN_API_KEY=xb_live_...
gunicorn --bind 0.0.0.0:8787 --workers 1 --threads 8 xybern_relay.app:app

Point the SDK at the relay

export XYBERN_BASE_URL=http://localhost:8787/v1
import xybern
xybern.auto.connect(mode="enforce")   # intercepts now resolve at the relay

Configuration

Env var Default Purpose
XYBERN_API_KEY (required) Workspace key, pulls policies, forwards audit
XYBERN_UPSTREAM_URL https://www.xybern.com/api/v1 Control plane URL
XYBERN_POLICY_REFRESH 60 Policy cache refresh interval (seconds)
XYBERN_FAIL_OPEN true If upstream unreachable: allow (true) or block (false)
XYBERN_FORWARD_AUDIT true Forward an audit record of every decision
XYBERN_CLOUD_POLICY_MODE forward Cloud-only policies: forward the intercept, or skip
XYBERN_RELAY_PORT 8787 Listen port

Endpoints

Method Path Purpose
POST /v1/enforce/intercept Authorise one action (local)
GET /v1/status Relay + policy-cache + audit-queue status
GET /healthz Liveness / readiness
* /v1/enforce/*, /v1/auth/* Transparently proxied upstream

Scaling note

The per-agent action log used by sequence policies is in-process. Run a single relay (with threads) for a coherent stateful view. The Docker image runs one gunicorn worker with 8 threads for exactly this reason; scale horizontally only if you don't rely on sequence/velocity policies, or front them with a shared store.