Portkey
The Observability-and-Gateway Play for Teams That Want One Vendor
Combines AI gateway, observability, and prompt management in one platform. The unified-vendor argument vs. the best-of-breed approach. When one vendor makes sense and when it does not.
The question that defines your AI infrastructure stack is whether you want one platform that does everything or three tools that each do one thing well. Portkey is the strongest argument for the first answer. It combines an AI gateway, observability, prompt management, and cost tracking in a single platform that is self-hostable, SOC 2 compliant, and capable of routing to over 1,600 models. The current version as of this writing is v1.15.2 for the self-hosted gateway server (released January 2026), with the Python SDK at v2.3.2 (released June 15, 2026). The project sits at roughly 12,300 stars on GitHub with an MIT license, and it serves a specific type of team well and a different type of team poorly. The distinction matters more than any feature comparison.
Portkey operates as three layers stacked into one deployment. The gateway layer handles model routing, load balancing, fallback chains, and rate limiting across providers. The observability layer captures every request with latency, token usage, cost, and response content, surfaced through a dashboard or exported via OpenTelemetry. The prompt management layer provides versioned prompt templates, A/B testing, and a registry that ties a prompt version to the model configuration that runs it. The same API key that routes your request also logs it and tracks its cost. The integration surface is single, which means your application code calls one endpoint and gets routing, logging, and cost attribution without wiring three SDKs together.
The advantages of this approach are real and should not be waved away as a convenience argument. The single-integration surface means your application code calls one endpoint, sends one set of headers, and gets routing, logging, cost tracking, and prompt management without wiring three independent SDKs together. Your security team approves one integration instead of three. Your ops team monitors one service instead of three. Your cost report comes from the same system that handled the request, not from a separate ingestion pipeline that can drift out of sync. For a team that is small enough that the integration surface matters more than the depth of any single capability, this is the right architecture.
The observability layer is where Portkey makes its strongest case relative to LiteLLM. Both products provide gateway capabilities, but LiteLLM relies on external observability tools (Arize Phoenix, Langfuse, Datadog) for its request tracing. Portkey includes it natively. The dashboard shows per-user cost breakdowns, model-level latency distributions, failure rate trends, and prompt performance comparisons without any additional configuration. For a team that does not already run an observability platform or that wants to keep AI observability separate from application observability, the built-in dashboard removes an integration step and a separate deployment to maintain. The cost attribution per user and per model is the feature that pays for itself when the finance team asks the question every enterprise AI deployment eventually faces: who is spending what, and is it worth it.
The gateway capabilities cover the standard requirements. Virtual keys so you can issue per-user and per-team credentials without exposing your provider API keys. Rate limiting per key at configurable thresholds. Model routing with fallback chains so that when GPT-4o is throttled, the request routes to Claude Sonnet 4 automatically. Load balancing across multiple instances of the same model for high-throughput deployments. For teams that need prompt injection detection and content moderation at the gateway layer, Portkey includes built-in guardrails that can inspect both input and output, though these are less configurable than NeMo Guardrails and operate more as classification-based filters than dialog-flow guardrails.
The self-hosting story is straightforward. Portkey is available as a Docker image from portkeyai/gateway with an MIT license that permits commercial use. The self-hosted version includes all core gateway features, observability, virtual keys, and rate limiting. SOC 2 compliance is certified on the cloud version and achievable on self-hosted with proper configuration. HIPAA BAAs are available for the enterprise plan. The deployment model that I have seen work best is Portkey deployed as a sidecar service in the same Kubernetes namespace as your application, with the observability data retained in its PostgreSQL database and the dashboard exposed to internal teams through your existing SSO. For teams that do not run Kubernetes, Docker Compose with a managed Postgres instance is sufficient for most deployments.
The cost tracking deserves specific attention because it is the feature that I consistently underestimated until I had to answer a budget question from a CFO. Portkey tracks cost at the request level using the provider’s published token pricing, then aggregates by virtual key, model, and user. The dashboard surfaces month-over-month trends, per-model cost breakdowns, and the cost per virtual key. The data is available through the REST API for integration into your existing billing or chargeback system. For a team that needs to show finance a breakdown of AI spend by department, this single feature eliminates what would otherwise be a custom aggregation pipeline that drifts out of accuracy the moment a model’s pricing changes.
Now the problems. The most significant is that Portkey’s self-hosted gateway server has not received a tagged release since January 2026, five months ago. The GitHub releases show v1.15.2 as the latest, and the Docker images on that tag have not been updated since January 12. The Python SDK continues active development with the v2.3.2 release in June, but the core gateway server has been in a release gap that would concern me if I were building a new deployment on it today. This does not necessarily mean the project is abandoned. The cloud platform may be on a different release cadence, and the self-hosted version may have reached a stable enough point that active development is happening on the cloud side. But the release gap is long enough that it warrants a conversation with the Portkey team before making a procurement decision. For a startup evaluating Portkey as a core infrastructure dependency, the gap matters less because the self-hosted version is stable and MIT licensed, so you can fork it if needed. For an enterprise evaluating Portkey as a long-term platform, the release gap is a point to investigate, not dismiss.
The second problem is the depth tradeoff. Portkey does three things well enough that most teams will find them sufficient. But it does not do any of them at the depth that a specialist tool provides. The guardrail capabilities are less configurable than NeMo Guardrails. The observability tracing is less granular than Arize Phoenix’s span-level agent traces. The prompt management is less mature than Langfuse’s prompt registry. The question is not whether Portkey is deeper than each specialist. It is whether the gap between Portkey’s capabilities and the specialist’s capabilities matters for your specific workload. For most teams running standard chat completions and basic RAG patterns, Portkey’s depth is sufficient and the integration savings are worth the gap. For teams running multi-step agent architectures with complex tool chains and requiring span-level tracing through every step of the agent’s reasoning, the specialist tool is the right choice.
The third problem is the unified-vendor risk. Putting your gateway, observability, prompt management, and cost tracking into one platform means that when the platform has an issue, every layer is affected. If your gateway goes down, your observability data stops flowing, your prompt registry is unreachable, and your cost tracking stops recording. The failure domain is the entire system. The best-of-breed approach has a different failure profile: any single tool can fail while the others continue operating, but the integration surface is larger and the drift between tools is a constant maintenance cost. The unified-vendor approach is not wrong. It is a tradeoff, and the team making the decision should name it as a tradeoff rather than treating it as a pure win.
The version gap is the honest caveat I would give any team evaluating Portkey today. The self-hosted gateway is stable at v1.15.2, and the existing feature set covers the core needs of most enterprise deployments. The Python SDK is actively maintained at v2.3.2, and the cloud platform appears to be where the development focus currently sits. If I were building a new AI infrastructure stack today and my team had integration bandwidth for exactly one tool, I would evaluate Portkey first and I would expect it to cover 80 percent of my requirements out of the box. If my team had the bandwidth to integrate three tools, I would build a best-of-breed stack around LiteLLM, Phoenix, and Langfuse and accept the integration cost for the depth each specialist provides. The right answer depends on your team size, your infrastructure maturity, and whether the last 20 percent of depth matters enough to pay the integration tax.
If this was useful, forward it to one engineer who needs less noise in their feed.


