Skip to content

Vault & Audit Trail

Every request through the Redact proxy is logged to an append-only cryptographic vault. The vault enables full auditability: you can prove that any record is unmodified, that no records have been deleted, and that the log has not been tampered with, without exposing the content of any document.


Vault Record Structure

Each vault record captures:

Field Description
doc_hash_original SHA-256 of the original (pre-anonymization) message content
doc_hash_anonymized SHA-256 of the anonymized message sent to the LLM
response_hash SHA-256 of the raw LLM response
entities_stripped_count Total number of entities anonymized
entities_stripped_types Breakdown by type: {"PERSON": 3, "ORG": 1, "EMAIL": 2}
model_used The upstream model identifier
doc_class Document class the policy applied to
policy_id ID of the policy used
api_key_id ID of the API key that made the request
status success, error, or blocked
chain_hash SHA-256 chain hash linking this record to the previous
record_hmac HMAC-SHA256 signature of the canonical record fields
leakage_detected Whether real values were found in the LLM response
leaked_entities_count Number of leaked values found and scrubbed
quasi_id_risk Whether healthcare quasi-identifier risk was detected
blocked_reason Human-readable reason if status = blocked
created_at UTC timestamp

Chain Hash (Append-Only Integrity)

Each record's chain_hash is computed as:

SHA-256(prev_chain_hash : record_id : doc_hash_original : timestamp)

The first record in a workspace uses "genesis" as prev_chain_hash.

This creates a cryptographic chain where any modification or deletion of a previous record invalidates all subsequent chain hashes, identical to how blockchain blocks are linked. You cannot silently remove a record without breaking the chain.


HMAC Record Signing

If you generate an HMAC key for your workspace (Settings → Integrity & Encryption), each record is signed with HMAC-SHA256 over its canonical fields:

{
  "id": "...",
  "workspace_id": "...",
  "doc_hash_original": "...",
  "doc_hash_anonymized": "...",
  "response_hash": "...",
  "entities_stripped_count": 7,
  "chain_hash": "...",
  "created_at": "..."
}

The HMAC is stored in record_hmac. Anyone holding the workspace HMAC key can independently verify that a record has not been altered since it was written.

To verify a record:

import hmac, hashlib, json

def verify_record(record: dict, expected_hmac: str, key_hex: str) -> bool:
    key = bytes.fromhex(key_hex)
    fields = {
        "id":                      record["id"],
        "workspace_id":            record["workspace_id"],
        "doc_hash_original":       record["doc_hash_original"],
        "doc_hash_anonymized":     record["doc_hash_anonymized"],
        "response_hash":           record["response_hash"],
        "entities_stripped_count": record["entities_stripped_count"],
        "chain_hash":              record["chain_hash"],
        "created_at":              record["created_at"],
    }
    canonical = json.dumps(fields, sort_keys=True, default=str).encode()
    computed = hmac.new(key, canonical, hashlib.sha256).hexdigest()
    return hmac.compare_digest(computed, expected_hmac)

Merkle Tree Batch Sealing

Seal Batch commits all unsealed vault records into a Merkle tree. Only the root hash is stored, not the content of any record.

records:   [r1, r2, r3, r4]
leaves:    [H(leaf:chain1), H(leaf:chain2), H(leaf:chain3), H(leaf:chain4)]
tree:
           root = H(H(l1,l2), H(l3,l4))
                  /                 \
           H(l1, l2)           H(l3, l4)
           /      \             /      \
          l1      l2           l3      l4

After sealing: - The root_hash is stored in redact_merkle_roots - The ordered list of record_ids is preserved to reconstruct proofs - Any record in the batch can be proven to be in the sealed set without revealing any other record

When to seal: Seal at the end of each working day, after a significant batch of transactions, or before a compliance review. Once sealed, the batch is immutable.


Merkle Inclusion Proofs

Click Prove next to any sealed vault record to get an inclusion proof:

{
  "record_id": "abc123",
  "chain_hash": "3d9f2a…",
  "root_hash": "7e1b4c…",
  "sealed_at": "2026-05-22T14:00:00Z",
  "batch_size": 6,
  "position": 2,
  "proof": [
    {"hash": "a4f81d…", "side": "right"},
    {"hash": "2c7e90…", "side": "left"}
  ],
  "proof_valid": true,
  "hmac_valid": true
}

To independently verify the proof:

def verify_proof(chain_hash: str, proof: list, root: str) -> bool:
    current = sha256("leaf:" + chain_hash)
    for step in proof:
        if step["side"] == "right":
            current = sha256(current + step["hash"])
        else:
            current = sha256(step["hash"] + current)
    return current == root

This proof can be provided to a third party (auditor, regulator) to prove that a specific interaction occurred in the sealed batch, without revealing any other records.


PDF Audit Export

Export a PDF audit report from the Vault tab (Export PDF button). The report includes:

  • Workspace metadata and export timestamp
  • Activity summary (total intercepts, entities stripped, HMAC-signed records, Merkle roots)
  • Security summary (leakage detections, blocked requests, k-anonymity flags)
  • Sealed Merkle roots (last 5)
  • Full vault records table with timestamps, doc classes, entity counts, model used, status, and HMAC status

The PDF can be filtered by date range using the from and to query parameters:

GET /api/redact/{workspace_id}/vault/export.pdf?from=2026-01-01&to=2026-05-31

Retention Policy

Vault records are kept indefinitely by default. Configure automatic deletion per workspace to comply with GDPR storage limitation requirements.

See the full guide: Vault Retention Policy →


Security Metrics

The dashboard Overview shows three counters derived from vault records:

Leakage Detections, cases where the LLM echoed a real value back in its response. The value was automatically re-anonymized before reaching your agent. Each detection suggests reviewing whether the prompt provides too much context that guides the model toward reconstructing PII.

Blocked Requests, requests rejected before anonymization because the API key was not scoped for the requested doc class. Check which key and agent triggered the block.

K-Anon Flagged, healthcare records where 2+ quasi-identifiers (age, location, condition) were present after anonymization. Consider enabling date shifting or stricter location stripping for those documents.