Skip to content

Semantic Policies & NL Authoring

Semantic Policies & Natural-Language Authoring

Most "policy engines" do one thing: deterministic rule-matching on a request, string equality, regex, RBAC/ABAC. That works for software with fixed inputs. It does not work for AI agents, which are semantic, probabilistic actors: the same prohibited outcome can be phrased a thousand ways, and a regex catches exactly one of them.

Xybern policies were never purely static, and this release takes them decisively beyond rule-matching.

Xybern policies were never just static rules

Before semantic policies, the Authorisation Layer already adapted at runtime in ways a flat rulebook can't:

Capability Why it isn't static
Adaptive agent trust Every agent carries a live trust score that rises with clean decisions and falls on blocks/escalations. The same threshold policy lets a proven agent through and stops a drifting one, the rule is fixed, the outcome is not.
Standard-path LLM verification When a policy escalates, the action content is run through full LLM verification that scores it and can promote an escalation to allow or demote it to block. The decision depends on the meaning of the action, not just the rule that flagged it.
Behavioural baselining Each agent has a behavioural profile (action-type mix, hour-of-day pattern, request rate, value magnitudes). Deviations raise an anomaly level that feeds the decision, a learned normal, not a hard-coded one.
Shadow mode Policies can run observe-only against live traffic, so a rule's real impact is measured before it ever enforces.

Semantic policies add the missing piece: judging intent directly.


The semantic policy type

A semantic policy holds a rule written in plain English. At enforcement time, the action is judged against that rule by an LLM, which decides whether the action violates it, reasoning over meaning, not surface wording. Paraphrases, euphemisms, obfuscation, and indirect phrasing that achieve the prohibited outcome all count.

# "Never promise a refund" — catches phrasings a regex never would
requests.post(
    "https://www.xybern.com/api/v1/enforce/policies",
    headers={"X-API-Key": API_KEY},
    json={
        "name": "No refund promises",
        "policy_type": "semantic",
        "decision": "block",
        "priority": 300,
        "action_types": ["send_email", "send_message"],
        "conditions": {
            "semantic_rule": "Never promise a customer a refund, discount, payout, "
                             "or compensation of any kind.",
            "min_confidence": 0.7
        }
    }
)

A regex for refund|discount misses "we'll make it right financially", "expect the funds back in your account", or a base64-encoded instruction. A semantic rule catches all three.

Conditions

Field Default Meaning
semantic_rule , (required) The rule in plain English. Describe the prohibited outcome clearly.
min_confidence 0.6 The judge returns a 0–1 confidence; the policy only triggers at or above this. Raise it for fewer, more certain triggers.
on_unavailable "skip" Behaviour if the judge can't be reached, see below.

Fail-open by design

The judge runs on the enforcement hot path, so it is built to never harm availability:

  • It uses a fast model, an in-process cache, a bounded prompt, and a short timeout.
  • If the judge is unreachable, the policy is skipped by default (on_unavailable: "skip"), an LLM outage never wrongly blocks legitimate traffic.
  • For rules where an outage should stop the action instead, set on_unavailable: "escalate" to fail closed and route to human review.

Semantic conditions also compose: drop one inside a composite policy's AND/OR tree, or run it in shadow mode to measure impact first, exactly like any other policy type.

Resilience, never a single point of failure

The judge tries Claude first and transparently falls back to DeepSeek (deepseek-v4-flash) if Claude is unreachable or out of credits. Semantic enforcement, and the Ask Xybern assistant, keep working through an upstream billing or rate event. If both providers are unavailable, the fail-open/fail-closed behaviour above takes over.

Configuration

Semantic evaluation uses your workspace's configured model key. The hot-path judge model is overridable via XYBERN_SEMANTIC_MODEL, and the DeepSeek fallback model via XYBERN_DEEPSEEK_MODEL.


Authoring a rule in plain English

You don't hand-write semantic policies field by field. Open Ask Xybern in the dashboard, describe the rule the way you'd say it, "block any agent from sending an email that promises a customer a refund or payout", and the assistant proposes a ready-to-create policy as a one-click card, choosing semantic automatically when the wording could vary. Review the suggested rule, decision, and confidence floor, adjust the agent if needed, and create it.

While configuring a semantic policy you can also paste a single action into the live test box on the policy form and see the verdict and confidence instantly, so you can tune the rule and confidence floor before saving.


Backtesting, prove it before it goes live

Any draft policy can be replayed against your workspace's real decision history before you deploy it. From the create form, click Run backtest.

The report shows:

  • How many of the sampled historical actions the policy would have triggered (and the trigger rate).
  • New friction: actions that were previously allowed but would now be blocked or escalated.
  • The breakdown of new decisions, and a sample of the specific actions hit, each with the judge's reason.

For deterministic types (regex, threshold, temporal) the backtest sweeps a wide window instantly. For semantic policies it samples the most recent actions and runs the judge in parallel, so you get a representative impact estimate in seconds, no guesswork, and no surprises in production.


When to use which policy type

Use… When the rule is about…
content_pattern (regex) Exact tokens or formats, a card-number shape, an exact keyword, an email domain.
semantic Meaning, intent, tone, sensitive content, or judgement, anything where the wording can vary.
threshold Agent trust level.
temporal Time-of-day / day-of-week.
composite Combining any of the above with AND / OR.

Reach for semantic whenever a determined agent could rephrase its way around a regex, which, for natural-language actions, is almost always.