LLM Privacy Gateway

The Problem

Every prompt you send goes through
their servers unfiltered.

OpenAI, DeepSeek, Anthropic — they all log requests. One leaked key or prompt could expose patient records, client data, or API secrets.

✗ Without the gateway

Raw prompts — including emails, SSNs, credit cards, API keys — travel to the cloud provider in plaintext.

        "email alice@company.com,

        SSN 123-45-6789,

        key sk-abc123..."

✓ With the gateway

PII is stripped or tokenized before the request is forwarded. The LLM never sees real sensitive data.

        "email [REDACTED_EMAIL],

        SSN [REDACTED_SSN],

        key [REDACTED_KEY]"

Who needs this?

Healthcare apps processing patient notes. Legal tools reading contracts. Finance platforms with transaction data. Any team using LLMs on sensitive data.

How it works

Three lines to protect every LLM call.

The gateway runs as a local proxy. Your app talks to it instead of OpenAI directly. No SDK changes required.

Step 01

Point your app at the gateway

Change base_url from api.openai.com to localhost:8787. That's it. Zero code changes.

Step 02

Gateway scrubs every request

Emails, phones, SSNs, credit cards, API keys — stripped or tokenized before the request leaves your machine.

Step 03

Response is sanitized too

Inbound responses are scanned for PII echoed back. Streamed responses handled chunk-by-chunk, split-boundary safe.

Step 04

Everything logged, encrypted

Every request/response pair is written to an AES-256-GCM encrypted local audit log. Only you hold the key.

🏠

True zero-cloud mode via Ollama

Route to a local model and your data never leaves your machine — not even to us. Just point the gateway at Ollama: --upstream-base-url http://localhost:11434. Works with llama3, qwen, deepseek-coder and any Ollama-compatible model. Verified: ZERO_CLOUD_OK ✓

Benchmark Results

10 out of 10. Zero leaks.

Tested against a curated corpus of 10 leak scenarios covering outbound PII, tool args, tainted inbound, and streaming split-boundary edge cases.

10/10

Cases passed · 0 leaks

Boundary redaction + tokenization + tool-arg scan + taint-aware stream buffer

Regex-only (both ways)

5/10

Cases passed · 5 leaks

Simulated baseline: simple regex on full payload, no taint memory

Regex outbound only

3/10

Cases passed · 7 leaks

Simulated baseline: no inbound scanning, no stream handling

Features

Production-grade. Zero dependencies.

Built in pure Python — stdlib only. No third-party risk in the privacy layer itself.

🔍

Boundary PII Redaction

Emails, phones, SSNs, credit cards, API keys, and custom patterns stripped before any outbound call.

🎭

Session Tokenization

Soft PII (names, identifiers) replaced with session-random tokens so the LLM still gets context — but no real data.

🔧

Tool Argument Scanning

Function-call arguments inspected for sensitive keys (api_key, password, token) before they reach the model.

📡

Stream-safe Sanitization

SSE and NDJSON streams sanitized chunk-by-chunk with a holdback buffer — no split-boundary leaks.

🧠

Taint-aware Policy

Tracks which sessions have seen sensitive values and prevents replays in subsequent requests.

📝

Encrypted Audit Logs

Every request/response pair logged with AES-256-GCM encryption. Key rotation supported. Only you can read it.

🚦

Rate Limiting + Auth

Per-IP rate limiting with SQLite backend, bearer token auth, and optional mTLS for enterprise deployments.

📊

Prometheus Metrics

/healthz and /metrics endpoints for monitoring. Plug straight into Grafana.

🏠

Local Model Support

Works with Ollama. Route to llama3, qwen, mistral — data never leaves your device. True zero-cloud.

Pricing

Start free. Scale when you need.

Self-hosted is always free and open source. Pay only if you want us to host and manage it for you.

Open Source

$0 / forever

Self-hosted. Full source code on GitHub. MIT license.

Full gateway + all privacy features
Works with any OpenAI-compatible API
Local Ollama support
Encrypted audit logs
Docker + one-command deploy
Managed hosting
Email support
Custom policy packs

Get the code →

Is this real end-to-end encryption?

True E2EE means only sender and receiver can read the message — but an LLM can't process encrypted text. What this gateway gives you is the practical equivalent: your PII never reaches the cloud provider's servers. For true zero-cloud, use Ollama local mode — data never leaves your machine at all.

Does the hosted version mean you see my LLM requests?

No. In the hosted plan, your requests are PII-scrubbed at the gateway (before logging) and forwarded to your chosen LLM provider using your own API token. We never store the raw content or your upstream credentials. Audit logs are encrypted with a key only you hold.

Does this work with OpenAI, Anthropic, DeepSeek, Azure OpenAI?

Yes — any OpenAI-compatible API endpoint works. Also supports Anthropic's API format, Gemini streaming, and local Ollama models. Just point --upstream-base-url at your provider.

What's the latency overhead?

The gateway adds <5ms of local processing. It's a lightweight Python proxy — no ML models running on your traffic, just regex + tokenization. For most LLM workflows, you won't notice the difference.

Do I need to change my code to use this?

No. Change one line: set base_url="http://localhost:8787" (or our hosted URL) in your OpenAI client. Everything else stays the same.

Your LLM calls.Your data. Zero exposure.