OSS Analytics, Monitoring, and Tracing for Federated GraphQL APIs


Jens Neuse
Today is the fourth and last day of our launch week, and we're concluding it with the announcement of the Cosmo Analytics Stack.
The goal of Cosmo is to provide an open source drop-in replacement for Apollo Federation / Apollo GraphOS. A key differentiator of Cosmo is that you can self-host it and run it on your own infrastructure without exceptions. Although most of our users prefer to use Cosmo Cloud , a fully managed SaaS solution, we wanted to give larger companies the option to self-host Cosmo on their own infrastructure.
The reasons for self-hosting Cosmo might vary. Some companies might be under strict regulations and are not allowed to use a SaaS solution. Others might be afraid of vendor lock-in, or it's because the cost of using a SaaS simply doesn't make sense due to the scale of the company. In all these cases, please reach out to us and we'll be happy to offer support for your on-premise Cosmo installation.
To be able to make Cosmo easy to self-host, we had to make sure to build Cosmo in a standardized way, not relying on any proprietary technologies or services. As a result, you can run Cosmo locally on your machine using Docker, or you can deploy it to any Kubernetes cluster.
Cosmo depends on PostgreSQL as the main system of record, Clickhouse for analytics, and Prometheus for metrics and monitoring. We collect and process metrics and traces using OpenTelemetry (OTEL).
Here's an overview of the Cosmo (Analytics) Stack:

Most of our users use Cosmo Cloud , a fully managed SaaS solution, combined with a self-hosted Cosmo Router.
As part of the Cosmo Analytics Stack, we provide an OTEL Collector that's connected to Clickhouse and Prometheus to collect and process traces and metrics. You can connect any other OTEL compatible service to the OTEL Collector to get the full picture of your API traffic, like Datadog, Elastic APM, or Jaeger.
Once OTEL is configured, you can start exploring your API traffic in Cosmo Studio.
Update: Read Field Level Metrics 101, published in January 2024, to learn more about field-level metrics.
Cosmo's distributed tracind doesn't start and end at the GraphQL layer. We're using OpenTelemetry for two main reasons. First, it's a vendor-neutral standard for distributed tracing. You can use the whole Cosmo Analytics Stack, but you don't have to. Second, OpenTelemetry goes beyond just the GraphQL layer. It allows you to generate traces across all your services, including your database, message queues, and more, not just the GraphQL layer.

In addition to distributed tracing, we also provide metrics and monitoring. Metrics can either be scraped from the Cosmo Router, or you can use the Cosmo OTEL Collector to collect metrics and send them to Prometheus, like we do in Cosmo Cloud.
Our initial focus is to provide metrics using the RED Method. RED stands for Rate, Errors, and Duration. More info on Prometheus support and Metrics can be found in the Docs .
Alright, that's it for today. Now have a look at the GitHub Repository and give it a ⭐️.
If you want to learn more about Cosmo, check out the documentation .
Jens Neuse
CEO & Co-Founder at WunderGraph
Jens Neuse is the CEO and one of the co-founders of WunderGraph, where he builds scalable API infrastructure with a focus on federation and AI-native workflows. Formerly an engineer at Tyk Technologies, he created graphql-go-tools, now widely used in the open source community. Jens designed the original WunderGraph SDK and led its evolution into Cosmo, an open-source federation platform adopted by global enterprises. He writes about systems design, organizational structure, and how Conway's Law shapes API architecture.
