Rate Limiting GraphQL Federation with Cosmo Router & Redis

Federation is a really powerful solution for building APIs across multiple services. It allows you to build a single, unified API that can be composed of multiple services, each with its own (Sub) GraphQL schema. This is great to scale your development team and to build a more modular and maintainable architecture. At the same time, it also opens up an attack vector for DDoS attacks right into your Microservices architecture.

Let's explore why traditional rate limiting solutions don't work for federated GraphQL APIs and how to set up rate limiting for federated GraphQL APIs using Cosmo Router and Redis. Rate limiting federated GraphQL APIs is a built-in feature of Cosmo Router, which is the leading Open Source API Gateway for Federated GraphQL APIs.

We're hiring!

We're looking for Golang (Go) Developers, DevOps Engineers and Solution Architects who want to help us shape the future of Microservices, distributed systems, and APIs.

By working at WunderGraph, you'll have the opportunity to build the next generation of API and Microservices infrastructure. Our customer base ranges from small startups to well-known enterprises, allowing you to not just have an impact at scale, but also to build a network of industry professionals.

See Open Positions

Why traditional rate limiting solutions don't work for federated GraphQL APIs

Rate limiting is a common technique to protect your APIs from abuse. It's not new and there are many solutions out there that can help you to implement rate limiting for your APIs. However, existing rate limiting solutions are not designed to work with federated GraphQL APIs. To be able to rate limit federated GraphQL APIs, you need to understand how a GraphQL query is executed. There's no simple 1:1 relationship between client request and Microservice requests like in REST APIs.

In a federated GraphQL API, one client request can result in hundreds or even thousands of requests to your Microservices. This is because the client can query multiple fields from multiple services in a single request. What is often described as an advantage of GraphQL, can also be seen as a challenge when it comes to securing your API.

Let's take a look at an example to illustrate this.

Here's an example of a federated GraphQL query:

This query requests the employee field from the EmployeeService and the hobbies field from the HobbyService. The EmployeeService and the HobbyService are two separate Microservices with their own GraphQL schema. When the client sends this query to the API Gateway, the API Gateway will forward the request to the EmployeeService and the HobbyService.

A traditional rate limiting solution would count this as one request when in reality it results in two requests to two different Microservices. But this can get even more complex, e.g. when we're fetching nested fields for not just one but multiple entities. Cosmo Router is able to efficiently batch nested requests like this, but the load on your Microservices might still increase.

The risks of not rate limiting federated GraphQL APIs

The problem with not rate limiting your federated GraphQL APIs is that the nature of GraphQL makes it very easy to craft a query that can result in a high load on your Microservices. What looks like a single request on the network level might actually be a batch of requests. Take a look at the following example Operation, which is a slight modification of the previous example:

Although this GraphQL Operation is still a single network request with a small payload, it results in 8*3=24 requests to load the employee id, details and hobbies for 8 employees.

Rate Limiting Federated GraphQL APIs at the Edge

If we can agree that rate limiting federated GraphQL APIs is important, the next question is where to implement it. We can implement rate limiting at the Edge, e.g. using Cloudflare Workers, at the API Gateway / Router level, or within the Microservices themselves.

The advantage of implementing rate limiting at the Edge is that it's very easy to set up and it can protect your API from some types of attacks. However, "on the Edge" usually means "far away" from your Microservices. If you'd run your Federated GraphQL API Gateway / Router on the Edge, you'd have a lot of overhead and latency for every request between Router and Microservices.

So, if we don't want to "execute" a federated GraphQL Operation on the Edge, this means that our "Edge Router" won't be able to see the actual requests that are being made to the Microservices. As a consequence, it won't be able to rate limit the actual requests, but can only "approximate" what the load on the Microservices might be.

What are the consequences of imprecise rate limiting? If your rate limiting is too lax, you might still be vulnerable to DDoS attacks. If it's too strict, you might block legitimate requests.

Approximating the load is not a great solution because an attacker can carefully craft a GraphQL Operation that passes the rate limit check but still results in a high load on your Microservices. But there's another reason why rate limiting at the Edge is not ideal for federated GraphQL APIs.

Let's say an attacker needs to be authenticated to send a request to your API, so they create an API key and attack your API from multiple regions simultaneously, e.g. by using a botnet. If you want to block this attacker, you'd have to detect the attack and block the API key. How do you do this? You'd have to share the state of the rate limit across all Edge Routers. Otherwise, the attacker could surpass the rate limit by staying below the rate limit on each Edge Router, but exceeding it in total.

If we share state across multiple Edge Routers, we're essentially building a distributed rate limiting system. A distributed rate limiting system is not just extremely complex to build and maintain, it's also either eventually consistent or very slow if it wants to be consistent. So what's the point of having a distributed rate limiting system if it's not precise, or precise but slow?

Edge Workers are great to protect an API from more generic attacks, but when it comes to rate limiting federated GraphQL APIs, this is not the right place to do it.

Rate Limiting Federated GraphQL APIs at the Subgraph / Microservice level

Ok, so what about implementing rate limiting within the Microservices themselves? On the one hand, this is a great place to implement rate limiting because we're close to the most expensive resources that we want to protect, like databases and other external services. On the other hand, a Subgraph lacks the context of the entire federated GraphQL Operation. This means that while we're able to rate limit the sub-request to our service, we're not able to protect the federated GraphQL API as a whole.

Furthermore, by implementing rate limiting within the Microservices, we create an organizational problem. The rate limiting logic is now spread across multiple services which are owned by different teams. First, we need to bring the knowledge of implementing rate limiting to all teams. Then we have to find consensus on how to implement rate limiting and how to report rate limiting violations to the API Gateway / Router. In addition, we have to implement and maintain the rate limiting logic in multiple services, and each team needs to run and operate their own rate limiting infrastructure, e.g. Redis.

Wouldn't it be great if we could implement rate limiting for federated GraphQL APIs in a single place, without having to modify the Microservices themselves?

This would allow us to have a single source of truth for rate limiting, save us from having to implement and maintain rate limiting logic in multiple services, and allow us to have a global view of the rate limiting state. In addition, we don't have to find consensus on how to implement it across all teams, and we don't have to spread the knowledge of implementing rate limiting across the whole organization.

Instead, we can enable a single platform team to implement and maintain rate limiting as a centralized service for all teams and services.

Rate Limiting Federated GraphQL APIs with Cosmo Router & Redis

With Cosmo Router, you can implement rate limiting for federated GraphQL APIs in a single place, close to your Subgraphs, but without having to modify them, and without having to implement and maintain rate limiting logic in multiple services.

The Router is the component in your federated GraphQL API architecture that generates and executes the federated GraphQL Operation. This means that the Router has the full context of the GraphQL Operation, knows which Subgraphs are involved and which fields are being requested from which Subgraph. With all this information, the Router can accurately rate limit and therefore protect your federated GraphQL API.

Compared to rate limiting at the Edge, the Router is much closer to the Microservices, and even if you're running a cluster of Routers, you can easily share the state of the rate limit across all Routers using a fast in-memory store like Redis.

In contrast to rate limiting within the Microservices, the Router has the full request context and has a lot of other advantages as discussed in the previous sections.

How does rate limiting work with Cosmo Router and Redis? Cosmo Router uses Redis under the hood to store the rate limit state for a given key. Depending on the configuration of the rate limit, the Router will increment the counter for the given key and check if the counter exceeds the limit. If the counter exceeds the limit, the Router will either raise an error for this particular field (partial rate limiting) or reject the Operation as a whole.

Let's take a look at the configuration:

In this example, we enable rate limiting and configure a simple rate limiting strategy. This strategy applies a rate limit of 60 requests per minute across all clients. If the rate limit is exceeded, the Router will reject the Operation.

On the storage side, we configure the address of the Redis server, the password, and a key prefix. The key prefix is used to namespace the rate limit keys.

In addition to the "simple" rate limiting strategy, we will support more advanced rate limiting strategies in the future, like rate limiting based on the client's IP address, their API key, or a claim in their JWT.

Let's take a look at Cosmo Router's rate limiting in action. Here's an example of a federated GraphQL Operation:

Let's take a look at the response of a valid request:

The request is fully valid and the response contains the requested data. In addition, the response contains a rateLimit field in the extensions part of the response. The extensions part of the response is a standard defined by the GraphQL specification and can be used to add additional information to the response.

The rateLimit field contains the following information:

requestRate: The number of requests that have been made within the current rate limit period
remaining: The number of requests that can still be made within the current rate limit period
retryAfterMs: The number of milliseconds after which the client should retry the request
resetAfterMs: The number of milliseconds after which the rate limit will reset

While the goal of rate limiting is to protect your API from abuse, it's also important to provide feedback to the client about the rate limit. If we use an intransparent algorithm to determine the rate limit, a good client might not be able to understand why their request was rejected and when they can retry.

With the rateLimit field in the response, the client can react to the rate limit and adjust their request rate accordingly.

Now let's take a look at the response of a request that exceeds the rate limit:

The response is lead by an errors object, followed by data set to null and the rateLimit field in the extensions part of the response. We can see that we have 0 requests remaining and that the client could retry the request after 464ms. The full rate limit will reset after 59464ms, which is about 1 minute.

You might be asking why we're not rate-limiting based on depth or number of fields, which leads us to the next section.

Choosing the right rate limiting algorithm for GraphQL APIs

When implementing rate limiting for (federated) GraphQL APIs, you have to choose the "right" rate limiting algorithm. But what does "right" mean in this context?

Rate limiting should protect your API from abuse, it should be fast and efficient, and it should be transparent to the client.

While we're trying to protect our API from abuse, we also want to allow legitimate requests to pass. This means that we want to make sure that our "good" clients can use our API in a predictable way.

If we used a complex algorithm to determine the rate limit, e.g. based on the depth of the Operation or the number of fields, we'd force our clients to understand this algorithm and to adjust their request rate accordingly. This means we'd have to document the algorithm and our clients would have to implement it if they want to stay below the rate limit.

So, the more complex the algorithm, the less transparent it is to the client. This is why we've chosen a very simple rate limiting algorithm for Cosmo Router, which is based on the number of Subgraph requests within a given time period.

What if you want to make a query with a depth of 5 but the limit is 3? Or you'd like to query 42 fields but the limit is 40? But does it even make sense to rate limit based on the depth or number of fields?

The number of fields might not necessarily correlate with the load on your Microservices. You can have a query with less than 10 fields that results in hundreds of requests because the user is querying nested fields within nested lists. An algorithm that's based on the depth or number of fields will not be aware of the number of Subgraph requests that are being made. This is because fields alone have no meaning. You need to rate limit depending on the data that comes back from the Subgraphs.

Let me give you a simple example to illustrate this. Let's say you have a Subgraph that returns a list of employees. For each employee, we want to fetch their details and hobbies. Doesn't the number of employees have a much bigger impact on the load on your Microservices than the number of fields?

As such, rate limiting based on the number of Subgraph requests is simple, transparent, and accurate, which is why we've chosen this algorithm for Cosmo Router.

Conclusion

We've discussed the importance of rate limiting for (federated) GraphQL APIs and why traditional rate limiting solutions don't work for GraphQL APIs. We've explored the risks of not applying rate limiting.

Next, we've looked at the different places where you can implement rate limiting. Rate limiting can be implemented at the Edge, at the Subgraph / Microservice level, or at the API Gateway / Router level. Each of these places has its own advantages and disadvantages.

Finally, we've explored the different rate limiting algorithms and why we've chosen a simple strategy for Cosmo Router.

If you have questions about rate limiting or Cosmo in general, feel free to join our community on Discord .

If you need more info on how to set up rate limiting with Cosmo Router, check out the Rate Limiting documentation .

Router / Gateway

MCP Gateway

Documentation

Zero to Production

GitHub

Community