Where does the Cosmo Router expose Prometheus metrics?

On http://localhost:8088/metrics by default. The port and path are configurable. Point any Prometheus-compatible scraper at it.

What metrics are exposed?

R.E.D. (rate, errors, and duration) for both router and subgraph requests. Core series include router_http_requests_total, router_http_request_duration_milliseconds, router_http_requests_error_total, router_http_requests_in_flight, and router_graphql_operation_planning_time.

How do I prevent metric cardinality explosion?

A default cardinality limit of 2000 unique combinations per metric is built in. Once the limit is reached, further datapoints are stored without attributes rather than dropped. You can also remove specific high-cardinality labels or whole metrics with regex-based exclusion patterns.

How do I query p99 latency for a specific operation?

Use the histogram_quantile() function over router_http_request_duration_milliseconds_bucket, summed by le and wg_operation_name. Wire the result into Alertmanager for an SLO alert.

Are runtime metrics included?

Go runtime statistics (memory, garbage collection, goroutines) are available via OTEL export when router_runtime is enabled. They are not exposed on the Prometheus /metrics scrape endpoint, because asynchronous instruments are not published there.

Do I have to choose between OTEL and Prometheus?

No. Prometheus is built on the same OpenTelemetry foundation. Core R.E.D. metrics are available via both OTEL export and the Prometheus scrape endpoint at the same time. Asynchronous instruments such as Go runtime metrics are available via OTEL only.

Prometheus Metrics for Federated GraphQL | Cosmo by WunderGraph

The problem

Generic metrics cannot answer GraphQL questions

Operations teams need per-operation, per-subgraph, per-client metrics. Generic HTTP series do not carry those dimensions.

Generic HTTP metrics miss GraphQL

Stock router metrics report request counts and durations but rarely carry the dimensions that matter: operation name, operation type, client, and subgraph.

Subgraph latency is invisible

When a federated query slows down, the only metric is the aggregate request duration. Without per-subgraph series, you cannot tell which service is causing the spike.

Cardinality grows until something breaks

Adding GraphQL operation labels without limits can blow up the time-series database. Then teams strip the labels and lose the visibility that justified them.

Our solution

GraphQL-aware Prometheus metrics, built in

The router exports R.E.D. metrics for router and subgraph traffic with the GraphQL labels operators actually need. Cardinality is bounded out of the box.

What happens on every request

The router collects R.E.D. metrics via the OpenTelemetry SDK on every request.
A Prometheus exporter publishes them on http://localhost:8088/metrics by default.
Labels carry GraphQL context: wg_operation_name, wg_operation_type, wg_operation_protocol, wg_client_name, wg_client_version, wg_subgraph_name, wg_subgraph_id, and http_status_code.
Prometheus scrapes the endpoint on its configured interval and stores the series.
Grafana queries the series for dashboards; Alertmanager queries them for SLO alerts.
A built-in cardinality limit of 2000 combinations per metric and regex exclusions keep the series count bounded.

Plug Prometheus into the endpoint. Dashboards and alerts follow.

Prometheus metrics

Before & After

Before Cosmo	With Cosmo
Generic HTTP metrics without operation or subgraph dimensions	R.E.D. metrics with wg_operation_name, wg_subgraph_name, and related GraphQL labels
Aggregate request duration hides which subgraph is slow	Per-subgraph latency and error series on the same endpoint
High-cardinality labels overwhelm the time-series database	Built-in 2000-combination limit and regex exclusions per metric
Custom instrumentation to expose federation metrics	/metrics on :8088 by default: scrape and query

Optional metrics

Beyond R.E.D.

Cache. Hit and miss ratios, costs, and key statistics.
Engine. Connections, subscriptions, and triggers.
Connection pool. Utilization and acquisition duration.
Circuit breaker. State and short-circuits.

Go runtime metrics (memory, GC, goroutines) are available via OTEL export with router_runtime enabled, not on the Prometheus scrape endpoint.

How Prometheus metrics work in Cosmo Router

01

R.E.D. for router and every subgraph.

Emit

Rate, errors, and duration for router and subgraph requests. Default endpoint http://localhost:8088/metrics. No custom instrumentation required.

02

GraphQL dimensions on every series.

Label

Series carry wg_operation_name, wg_operation_type, wg_operation_protocol, wg_client_name, wg_client_version, wg_subgraph_name, wg_subgraph_id, and http_status_code.

03

2000-combination cap, regex exclusions.

Bound

A default cardinality limit of 2000 unique combinations per metric bounds label growth. Once the limit is reached, further datapoints are stored without attributes. Regex exclusions remove labels or whole metrics by pattern.

04

Alertmanager-ready out of the box.

Alert

Wire the metrics into Grafana for dashboards and Alertmanager for SLO alerting. Common queries (p99 latency, subgraph error rate) fit on one screen.

Telemetry controls

PromQL, cardinality, and runtime

Query by operation and subgraph. Bound label cardinality. Export Go runtime metrics via OTEL when you need them.

p99 latency by operation

histogram_quantile(
  0.99,
  sum by (le, wg_operation_name) (
    rate(router_http_request_duration_milliseconds_bucket[5m])
  )
)

Error rate by subgraph

sum by (wg_subgraph_name) (
  rate(router_http_requests_error_total[5m])
)

Cardinality cap

A default limit of 2000 unique label combinations per metric. Regex exclusions remove labels or whole metrics by pattern.

Runtime metrics

Go runtime statistics (memory, GC, goroutines) are available via OTEL export with router_runtime enabled. They are not on the Prometheus /metrics scrape endpoint.

Scrape Cosmo Router today

The Prometheus endpoint is on by default. Add the scrape job and start querying.

Start Free Read the Docs

R.E.D. metrics for every GraphQL operation and subgraph