OSS Analytics, Monitoring, and Tracing for Federated GraphQL APIs

Today is the fourth and last day of our launch week, and we're concluding it with the announcement of the Cosmo Analytics Stack.

The goal of Cosmo is to provide an open source drop-in replacement for Apollo Federation / Apollo GraphOS. A key differentiator of Cosmo is that you can self-host it and run it on your own infrastructure without exceptions. Although most of our users prefer to use Cosmo Cloud , a fully managed SaaS solution, we wanted to give larger companies the option to self-host Cosmo on their own infrastructure.

The reasons for self-hosting Cosmo might vary. Some companies might be under strict regulations and are not allowed to use a SaaS solution. Others might be afraid of vendor lock-in, or it's because the cost of using a SaaS simply doesn't make sense due to the scale of the company. In all these cases, please reach out to us and we'll be happy to offer support for your on-premise Cosmo installation.

To be able to make Cosmo easy to self-host, we had to make sure to build Cosmo in a standardized way, not relying on any proprietary technologies or services. As a result, you can run Cosmo locally on your machine using Docker, or you can deploy it to any Kubernetes cluster.

Cosmo depends on PostgreSQL as the main system of record, Clickhouse for analytics, and Prometheus for metrics and monitoring. We collect and process metrics and traces using OpenTelemetry (OTEL).

Overview of the Cosmo Analytics Stack

Here's an overview of the Cosmo (Analytics) Stack:

Cosmo Architecture

Most of our users use Cosmo Cloud , a fully managed SaaS solution, combined with a self-hosted Cosmo Router.

As part of the Cosmo Analytics Stack, we provide an OTEL Collector that's connected to Clickhouse and Prometheus to collect and process traces and metrics. You can connect any other OTEL compatible service to the OTEL Collector to get the full picture of your API traffic, like Datadog, Elastic APM, or Jaeger.

Analytics for Federated GraphQL APIs with Clickhouse

Once OTEL is configured, you can start exploring your API traffic in Cosmo Studio.

Update: Read Field Level Metrics 101, published in January 2024, to learn more about field-level metrics.

Cosmo Analytics in Studio

Distributed Tracing for Federated GraphQL APIs with OpenTelemetry

Cosmo's distributed tracind doesn't start and end at the GraphQL layer. We're using OpenTelemetry for two main reasons. First, it's a vendor-neutral standard for distributed tracing. You can use the whole Cosmo Analytics Stack, but you don't have to. Second, OpenTelemetry goes beyond just the GraphQL layer. It allows you to generate traces across all your services, including your database, message queues, and more, not just the GraphQL layer.

Distributed Tracing in Cosmo Studio

Metrics and Monitoring for Federated GraphQL APIs with Prometheus

In addition to distributed tracing, we also provide metrics and monitoring. Metrics can either be scraped from the Cosmo Router, or you can use the Cosmo OTEL Collector to collect metrics and send them to Prometheus, like we do in Cosmo Cloud.

Our initial focus is to provide metrics using the RED Method. RED stands for Rate, Errors, and Duration. More info on Prometheus support and Metrics can be found in the Docs .

Conclusion

Alright, that's it for today. Now have a look at the GitHub Repository and give it a ⭐️.

If you want to learn more about Cosmo, check out the documentation .

Router / Gateway

MCP Gateway

Documentation

Zero to Production

GitHub

Community

Overview of the Cosmo Analytics Stack

Analytics for Federated GraphQL APIs with Clickhouse

Distributed Tracing for Federated GraphQL APIs with OpenTelemetry

Metrics and Monitoring for Federated GraphQL APIs with Prometheus

Conclusion