â–¸Case Study

How NerdWallet Eliminated 10-Second GraphQL Query Planning Delays

NerdWallet
10s down to <1s
Query planning time on the slowest operations
Order of magnitude faster
Faster on previously slow operations after warmup
3-phase migration
Phased migration designed to protect production traffic

Company Overview

NerdWallet is a personal finance platform that helps people make smarter decisions about their money by comparing and evaluating products like credit cards, banking, investing, mortgages, and insurance. It offers tools, content, and a mobile app to help consumers understand options, choose financial products, and track their finances with greater confidence.

The Challenge

A Gateway Architecture Under Pressure

NerdWallet ran its GraphQL platform with Apollo Gateway in front of roughly 10 subgraphs — mostly Node.js services, with one written in Go. The setup worked early on, but as more product teams attached subgraphs to the graph, the gateway began showing operational scaling limits.

Upgrading was blocked.

Legacy code tied the team to an old Federation version. New subgraphs followed current Apollo patterns, but the gateway couldn't validate or plan those shapes. Teams building subgraphs found themselves constrained by a router that couldn't keep up.

Query planning started blocking traffic.

Complex queries took more than 10 seconds to plan. While those queries waited, other requests backed up behind them. The slowdown spread across the graph and began affecting users.

To compensate, the team wrote their own internal prewarming logic. It helped until a new slow query appeared. If that query wasn't added to the list, the performance of the entire graph could suffer.

The Solution

The Solution: A Phased Migration to WunderGraph Cosmo

When NerdWallet evaluated alternatives, they focused on three things: Federation support, performance, and cost, both in infrastructure and team time. The migration also had to be safe.

They chose WunderGraph Cosmo and ran a three-phase rollout:

01
Match the gateway.
Ensure Cosmo Router supported every feature the Apollo Gateway provided before touching traffic.
02
Sync schema registries.
Publish schema updates to both registries simultaneously so the old gateway and Cosmo could stay aligned during testing.
03
Shift traffic gradually.
Move production traffic from Apollo Gateway to Cosmo Router in small increments, validating at each step.

When roadblocks appeared - missing Federation features, behavior gaps tied to newer Apollo spec versions - the WunderGraph team patched them or built the missing support directly.

"The WunderGraph team jumped in to patch issues and build features that unblocked our migration."

Subgraph teams retained ownership of their services and features throughout the migration. Internal QA ran regression tests at each stage. The phased approach allowed them to migrate while keeping user-facing behavior stable.

The Result

The Graph Got Faster. The Team Got Time Back.

  • Cache warming eliminated repeated 10+ second planning delays and brought planning latency down dramatically after warmup.
  • Removed fragile internal prewarming scripts that required manual updates every time a slow query was added.
  • High confidence in Supergraph updates and horizontal scaling, with router version upgrades still handled carefully.
  • Improved visibility into slow queries and bottlenecks via Cosmo Studio and OTEL tracing.
  • Reduced operational overhead, no more emergency prewarm list maintenance, fewer brittle workarounds around the old gateway.

Cache Warmer

The Performance Breakthrough

The biggest improvement came from Cosmo's Cache Warmer.

Previously, NerdWallet's internal script tried to prewarm the worst offenders. It worked, but it required manual updates.

Cosmo Cache Warmer replaced this entirely. It automatically identifies and prewarms the most expensive query plans before serving traffic. Subsequent requests skip the planning phase completely.

"Caching query planning was a game-changer in terms of performance for us."

For some of their slowest production operations, cache warming eliminated repeated 10+ second planning delays and dramatically improved performance after warmup. Similar cold-start planning behavior was later documented publicly in WunderGraph’s Super Bowl scaling analysis, where cache warm-up reduced planning spikes from 8-15 seconds to below one second.

Observability

Improved Observability Across the Graph

As the platform grew, the team needed better visibility into what was happening inside the graph.

Cosmo Studio provided traces, analytics, and schema visibility in one place. The team could see slow requests, identify complex execution paths, and quickly spot where subgraphs were struggling, turning guesswork into targeted fixes.

OTEL tagging confirmed that performance stayed on par or improved through the migration. Cache-hit metrics helped tune the router after rollout.

Schema Checks help reduce the risk of introducing breaking changes during Supergraph updates.

The team now has high confidence in Supergraph updates and horizontal scaling, supported by OTEL signals and Studio analytics.

Scaling

Predictably Under Load

NerdWallet already ran traffic forecasting, load tests, and infrastructure prescaling ahead of high-traffic events. But expensive query planning was always the hardest variable to control. A single slow planning operation could stall unrelated traffic across the graph.

Cache warming removed that variable. With the most expensive plans precomputed and cached before traffic arrived, new router instances came online with a populated planning cache instead of a cold start.

We see this same pattern in other federated graphs at scale. A router starting from a cold cache will hit the same planning delays, regardless of infrastructure capacity. In our experience, eliminating that cold-start window is one of the most direct ways to stabilize performance during scale events. WunderGraph documented this in detail in a Super Bowl scale-up analysis, where cold-start planning behavior under high traffic followed the same dynamic this team had seen before adopting warm-up.

The result for NerdWallet: more predictable scaling, fewer surprises during planned traffic events.

From Legacy Gateway to a Future-Ready Federation Platform

NerdWallet's migration replaced an upgrade‑blocked gateway with a modern federation platform, avoiding a big‑bang cutover while keeping user-facing behavior stable and preserving subgraph teams’ ownership of their services and features.

The key outcomes:

  • Removed the query planning bottleneck that blocked upgrades and degraded the graph
  • Eliminated manual operational work tied to custom prewarming infrastructure
  • Gained direct visibility into graph behavior through Cosmo Studio and OTEL tracing
  • Established a scaling posture that handles traffic spikes without cold-start surprises

If your gateway is showing the early signs - slow planning, blocked upgrades, increasing operational workarounds - those delays tend to grow over time. Cosmo is designed to address these bottlenecks at the router and planning layer.