A drop-in privacy proxy that strips PII, secrets, and sensitive data before they reach any AI provider โ cloud or local.
OpenAI, DeepSeek, Anthropic โ they all log requests. One leaked key or prompt could expose patient records, client data, or API secrets.
Raw prompts โ including emails, SSNs, credit cards, API keys โ travel to the cloud provider in plaintext.
PII is stripped or tokenized before the request is forwarded. The LLM never sees real sensitive data.
Healthcare apps processing patient notes. Legal tools reading contracts. Finance platforms with transaction data. Any team using LLMs on sensitive data.
The gateway runs as a local proxy. Your app talks to it instead of OpenAI directly. No SDK changes required.
Change base_url from api.openai.com to localhost:8787. That's it. Zero code changes.
Emails, phones, SSNs, credit cards, API keys โ stripped or tokenized before the request leaves your machine.
Inbound responses are scanned for PII echoed back. Streamed responses handled chunk-by-chunk, split-boundary safe.
Every request/response pair is written to an AES-256-GCM encrypted local audit log. Only you hold the key.
Route to a local model and your data never leaves your machine โ not even to us. Just point the gateway at Ollama: --upstream-base-url http://localhost:11434. Works with llama3, qwen, deepseek-coder and any Ollama-compatible model. Verified: ZERO_CLOUD_OK โ
Tested against a curated corpus of 10 leak scenarios covering outbound PII, tool args, tainted inbound, and streaming split-boundary edge cases.
Built in pure Python โ stdlib only. No third-party risk in the privacy layer itself.
Emails, phones, SSNs, credit cards, API keys, and custom patterns stripped before any outbound call.
Soft PII (names, identifiers) replaced with session-random tokens so the LLM still gets context โ but no real data.
Function-call arguments inspected for sensitive keys (api_key, password, token) before they reach the model.
SSE and NDJSON streams sanitized chunk-by-chunk with a holdback buffer โ no split-boundary leaks.
Tracks which sessions have seen sensitive values and prevents replays in subsequent requests.
Every request/response pair logged with AES-256-GCM encryption. Key rotation supported. Only you can read it.
Per-IP rate limiting with SQLite backend, bearer token auth, and optional mTLS for enterprise deployments.
/healthz and /metrics endpoints for monitoring. Plug straight into Grafana.
Works with Ollama. Route to llama3, qwen, mistral โ data never leaves your device. True zero-cloud.
Self-hosted is always free and open source. Pay only if you want us to host and manage it for you.
True E2EE means only sender and receiver can read the message โ but an LLM can't process encrypted text. What this gateway gives you is the practical equivalent: your PII never reaches the cloud provider's servers. For true zero-cloud, use Ollama local mode โ data never leaves your machine at all.
No. In the hosted plan, your requests are PII-scrubbed at the gateway (before logging) and forwarded to your chosen LLM provider using your own API token. We never store the raw content or your upstream credentials. Audit logs are encrypted with a key only you hold.
Yes โ any OpenAI-compatible API endpoint works. Also supports Anthropic's API format, Gemini streaming, and local Ollama models. Just point --upstream-base-url at your provider.
The gateway adds <5ms of local processing. It's a lightweight Python proxy โ no ML models running on your traffic, just regex + tokenization. For most LLM workflows, you won't notice the difference.
No. Change one line: set base_url="http://localhost:8787" (or our hosted URL) in your OpenAI client. Everything else stays the same.