Can I self-host for free? +
Yes. The full source code is available under the Apache 2.0 license on GitHub. You can deploy it anywhere — your own servers, Docker, Kubernetes, or any cloud provider — completely free, with no restrictions for commercial use.
Does the Professional plan include a HIPAA BAA? +
The Professional plan does not include a HIPAA BAA by default. You need the Enterprise plan or the Healthcare Compliance Add-On for a signed BAA. If you are a covered entity handling PHI, contact us to discuss your specific requirements.
What counts as a "request"? +
One request is one call to the gateway's /v1/chat/completions endpoint. Streaming responses count as one request. Cache hits count as one request (but save the LLM cost entirely). There is no per-token billing from us — you pay your LLM provider directly.
Can I switch from self-hosted to managed without downtime? +
Yes. Because the gateway is a drop-in proxy, you simply point your base_url to the new managed endpoint. All your API keys, configuration, and provider integrations migrate with a config file import — no code changes in your applications.
Do you store my LLM responses? +
In the managed plans, audit logs are stored (encrypted at rest) to provide the immutable trail needed for compliance. The semantic cache stores a hash of the request and the response — never the raw PHI. You can disable caching for PHI traffic in your config. On self-hosted, you control all storage.
What LLM providers do you support? +
17 providers: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Google Gemini, Groq, Mistral, DeepSeek, Cohere, Fireworks AI, HuggingFace, Ollama (local), Perplexity, Replicate, Together AI, xAI (Grok), and AI21. New providers are added regularly — open a GitHub issue to request one.