Zero-Cost Abstractions for @skip and @include in Federated GraphQL
Compared to a monolithic GraphQL API, distributed GraphQL APIs come with some unique challenges. Instead of just calling resolvers within the same process, the GraphQL Router makes network requests to the Subgraphs, the sub services that implement the resolvers for the federated Schema.
Consequently, the latency of a federated GraphQL Request is determined by the waterfall of network requests from the Router to the Subgraphs. As such, our goal is to keep the number of network requests as low as possible.
This article will look into one specific topic that has a significant impact on the number of network requests: The @skip
and @include
directives. You might be surprised to learn that some implementations of these directives can lead to additional network requests, which can significantly degrade the performance of your federated GraphQL API.
We're going to look at three different approaches of implementing @skip
and @include
in a federated GraphQL Router, and we'll discuss the trade-offs of each approach. You'll learn how our implementation evolved over time and how we ended up with a zero-cost abstraction to implement these directives.
We're hiring!
We're looking for Golang (Go) Developers, DevOps Engineers and Solution Architects who want to help us shape the future of Microservices, distributed systems, and APIs.
By working at WunderGraph, you'll have the opportunity to build the next generation of API and Microservices infrastructure. Our customer base ranges from small startups to well-known enterprises, allowing you to not just have an impact at scale, but also to build a network of industry professionals.
Before we dive into the details, let's quickly recap what the @skip
and @include
directives are used for in GraphQL, and why they are so popular among advanced GraphQL users.
What are the @skip
and @include
directives and why are they so popular?
With the @skip
and @include
directives, you can conditionally include or exclude fields in your GraphQL response. This is particularly useful when you want to avoid fetching unnecessary data from your backend. Instead of writing multiple queries or fragments for different use cases in your frontend, you can use these directives to conditionally enable or disable parts of your query.
This is especially useful when using Fragments in complex Frontend architectures, as it allows UI components to not just define their data requirements in a colocated Fragment, but also to conditionally include or exclude parts of the Fragment based on the current state of the UI, like for example the user's permissions or the current route.
Instead of using multiple Fragments and combining them at the code level, the @skip
and @include
directives allow you to handle this logic within the GraphQL Query itself, which has multiple benefits. First, it keeps the UI components more focused and easier to understand, as you don't have to manage the combination of multiple Fragments in your code. Second, this approach allows the GraphQL Server / Router to optimize the query execution, e.g. by combining and joining fields that are requested multiple times across different Fragments.
Here's an illustration of a very popular pattern that we're seeing a lot of usage across our customers' federated GraphQL APIs:
What's interesting about his pattern is that we've found it being used across many of our customers in different variations, but the core idea is always the same. Different teams or developers are responsible for different parts of the User Interface, so they build their own Fragments which are typically co-located with the UI components. As they don't want to fetch unnecessary data from the backend (overfetching), they use the @skip
and @include
directives to conditionally include or exclude parts of the query.
In the example above, we've got one component that shows the Employee Details alongside the Employee's current Mood, while a second component only shows the Employee's Mood.
If we set $withEmployeeDetails
to true
, we can show the details on the user. In addition, we can set $skipMood
to false
to show the Employee's current Mood in both components as well. As you can also see from the comments in the query, the Employee Details are fetched from the Employee Subgraph, while the Employee's current Mood is fetched from the Mood Subgraph.
The problem with @skip
and @include
in a federated GraphQL API
If you take a closer look at the query above, you'll notice that we don't need to fetch the Employee's current Mood from the Mood Subgraph at all if $skipMood
is set to true
. To be more precise, we want to make zero network requests to the Mood Subgraph if the @skip
directive evaluates to true
.
You'll see that Query Planners in federated GraphQL Routers can take different approaches to implement the @skip
and @include
directives. In general, we found three different approaches to solving this problem:
- Subgraph Directive Evaluation Approach: The Router forwards the skip & include directives to all Subgraphs, and the Subgraphs decide whether to include or exclude the field based on the
@skip
and@include
directives. - Smart but expensive Approach: The Router creates an execution plan for all Subgraph Fetches is smart enough to only execute the Subgraph Fetches that are necessary.
- Zero-Cost Abstraction: The Router "Normalizes" the Query in a way that the Query Planner & Subgraphs don't need to know about the
@skip
and@include
directives.
We call the third approach the "Zero-Cost Abstraction" because it has (almost) zero cost in terms of overhead and complexity, but let's first take a look at the first two approaches to understand why they are not ideal, and why we ended up with the Zero-Cost Abstraction.
Before we look into the details of the different approaches, let's first understand how a federated GraphQL Router works and how it would resolve the query if we didn't have the @skip
and @include
directives.
How a federated GraphQL Router resolves a distributed Query
Here's a simplified version of the Employees Subgraph and the Mood Subgraph:
At the heart of Federation is the concept of Entities and their keys. Entities are the root types of your Subgraphs, with their corresponding keys to identify them. By declaring the Employee
type in the Mood Subgraph with the same key(s) as in the Employees Subgraph, the Router can "jump" from one Subgraph to another by using the key(s) of the Entity as a reference, and then fetch the additional data from the other Subgraph.
If we ignore the @skip
and @include
directives for a moment, the Router would make the following two network requests to resolve the query:
Variables:
You'll notice that the Router fetches the details, but also the __typename
and id
fields, which are necessary for the Router to then make the "jump" to the Mood Subgraph. Here's the second network request that the Router would make to the Mood Subgraph:
Variables:
By constructing an array of "representations" objects with the __typename
and the key(s) of the Entity, the Router can make a batched request to the Mood Subgraph to fetch additional fields for one or more Entities.
So far so good, let's cause some trouble by introducing the @skip
and @include
directives.
Subgraph Directive Evaluation Approach: Forwarding skip & include directives to all Subgraphs
The Subgraph Directive Evaluation Approach is the simplest way to implement the @skip
and @include
directives in a federated GraphQL Router. The Router more or less ignores that the directives have any special meaning and forwards them to all Subgraphs like it would with any other executable directive.
Here's how the Router would resolve the query with the Subgraph Directive Evaluation Approach, assuming that $withEmployeeDetails
is set to true
and $skipMood
is set to true
, so we only want to fetch the Employee Details from the Employees Subgraph, but not the Employee's current Mood from the Mood Subgraph. Keep in mind that we only want to make one single network request and do so in the most efficient way possible.
This might not be what you would expect, but at a second glance, it should make sense. Let me explain what happens here.
The Router forwards the first part with the @include
directive to the Employees Subgraph, this is the part that's more or less to be expected. The part that might surprise you is that the Router also creates a second selection set to fetch the __typename
and id
fields if $skipMood
is set to false
. You might be thinking that this is unnecessary, as we've already fetched the __typename
and id
fields in the first part of the query. However, as the Router doesn't evaluate the @skip
and @include
directives itself, it is not aware if the __typename
and id
fields are already fetched in another part of the query, so it's forced to include both selection sets in the query.
Unsurprisingly, this approach makes a second network request to the Mood Subgraph! Here's the query that the Router would send to the Mood Subgraph:
Ok, let's unpack this query. The first part of the query is wrapped with the @include
directive, which then contains the currentMood
field with the @skip
directive. This is to fetch the data for the first Fragment in the query. The second part is fetching the currentMood
field for the second Fragment, and it's wrapped with another @skip
directive.
I want to emphasize that the root cause of this problem is not just the @skip
and @include
directives, but the fact that we need the __typename
and id
(key) fields to be able to "jump" between Subgraphs to resolve additional fields in the query. It would be very complicated for the Query Planner to keep track of all the fields and their dependencies across different Subgraphs, so the solution is to just forward the query to all Subgraphs and let them evaluate the @skip
and @include
directives. This approach is safe in the sense that it will always return the correct data, but it's not efficient as it can lead to additional network requests.
Let's take a look at the second approach, which tries to be smarter about which Subgraph Fetches to execute.
Smart but expensive Approach: Make the Router aware of the skip & include directives
From the very beginning of our journey with building GraphQL API Gateways, we've been aware of this problem and wanted to be smarter about how we handle the @skip
and @include
directives. We wanted to avoid the additional network requests that the Subgraph Directive Evaluation Approach introduces, so we thought that the Router should be aware of the special meaning of these directives and optimize the query execution accordingly.
As a result, we came up with a more sophisticated Query Planner that would analyze the query and create a list of Subgraph Fetches that, alongside the Query itself, would also contain some logic to evaluate whether a Subgraph Fetch should be executed or not. Here's an illustration of how such an execution plan could look like:
You can see why some implementations might prefer the Subgraph Directive Evaluation Approach over this one. For each combination of the @skip
and @include
directives, the Router needs to plan a separate Subgraph Fetch to load exactly the data that's needed, e.g. the __typename
and id
fields if $skipMood
is set to false
so that the Router can make the "jump" to the Mood Subgraph.
This approach makes exactly one network request when $withEmployeeDetails
is set to true
and $skipMood
is set to true
, but you can see that the Query Planner needs to do a lot of complex logic to determine which Subgraph Fetches to execute. This is not just expensive in terms of CPU time and memory for the planning phase, but it also makes the codebase more complex and harder to maintain.
We've used this approach in the past and it worked ok for us, but eventually we moved away from it due to a related problem that we encountered with the approach we were using to plan the Subgraph Fetches. Let me explain this in the next section, which will lead us to the Zero-Cost Abstraction.
Zero-Cost Abstraction: Normalizing the Query to avoid the need for the skip & include directives
As we were onboarding more and more customers to the Cosmo Stack, we got more and more exposure to extremely complex federated GraphQL APIs, with very complex nested Fragments, abstract types like Interfaces and Unions, and of course the @skip
and @include
directives as described at the beginning of this article.
We noticed that we're spending quite a lot of time to re-write and optimize the Query Plan, e.g. when we need to resolve fields on abstract types that are not implemented in all Subgraphs. Such a scenario is quite complex to plan correctly as we need to know exactly which Subgraph provides which fields, and if we're not able to resolve a field on an abstract type in one Subgraph, we need to find another Subgraph that can provide the data, which also means that we need to plan fetching the key(s) so we can "jump" to the other Subgraph. All of this complexity adds up and can take up to a couple of seconds to plan a single query in the worst-case scenario.
Consequently, we were looking for ways to simply "do less work" in the Query Planner. We thought that the simplest solution to save time would be to plan less fields.
Removing all selection sets from the query in a normalization step would mean that we don't need to plan fields that would be skipped anyway.
This is how we came up with the Zero-Cost Abstraction. Instead of forwarding the @skip
and @include
directives to the Subgraphs, which leads to additional network requests as we've seen in the Subgraph Directive Evaluation Approach, and instead of making the Router aware of the special meaning of the @skip
and @include
directives, which makes the Query Planner more complex and expensive to run, we decided to simply remove the parts of the query where @skip
evaluates to true
and @include
evaluates to false
.
To illustrate this, let's take a look at the query from the beginning of this article:
Now, given that $withEmployeeDetails
is set to true
and $skipMood
is set to true
, the Router would normalize the query to the following:
First, we inline all Fragments into the Query to turn it into a single tree. This makes it easier for the Query Planner to analyze and rewrite the query as we don't have to deal with Fragments anymore. Fragments are great for Frontend developers to co-locate the data requirements of their UI components, but they only add complexity for implementing a Query Planner.
Second, we evaluate all @skip
and @include
directives and remove the parts of the query that would be skipped. What we end up with is a normalized query that only contains the fields that are actually needed to resolve the query.
As a result, the Query Planner doesn't need to do anything special to handle directives. In fact, the Query Planner doesn't even need to know about the @skip
and @include
directives at all.
Furthermore, the resulting Subgraph Fetches will never contain @skip
and @include
directives, which means that the Subgraphs will not have to parse and evaluate unnecessary fields. This might not sound like a big deal, but if you're sending 20kb of query data to a Subgraph, of which you could skip 10kb if you evaluated the @skip
directive in the Router, this can have a significant impact on the performance of your federated GraphQL API. Popular languages like Node.js are single-threaded and therefore have limited CPU resources for compute heavy tasks like parsing, so you want to avoid unnecessary work as much as possible.
As you can guess, after normalizing the query, the Router will plan a single Subgraph Fetch to the Employees Subgraph to fetch the Employee Details. There will be no extra network requests to the Mood Subgraph, and there's also no need to fetch additional key fields or evaluate if subsequent Subgraph Fetches should be executed.
We've called this approach a "Zero-Cost Abstraction" because it has (almost) zero cost in terms of overhead and complexity. However, if we're 100% honest, it's not just zero cost, but it's actually negative cost! Evaluating the @skip
and @include
directives during the normalization step is a relatively cheap operation compared to the CPU time and memory that the Query Planner would need to plan the fields if we didn't remove them.
How much of a difference does the Zero-Cost Abstraction make?
You might be wondering how much of a difference the Zero-Cost Abstraction makes in practice. So we've run some benchmarks of the Subgraph Directive Evaluation Approach vs the Zero-Cost Normalization Approach with an artificial Subgraph latency of 100ms, which is a typical value that we see in production environments.
Here are the results of the Subgraph Directive Evaluation Approach:
And here are the results of the Zero-Cost Normalization approach:
As we would expect, the Zero-Cost Normalization approach has roughly half the latency and twice the throughput compared to the approach that forwards the @skip
and @include
directives to the Subgraphs, which requires an additional network request to the Mood Subgraph.
Conclusion
In this article, we've looked at three different approaches to implementing the @skip
and @include
directives in a federated GraphQL Router, and we've discussed the trade-offs of each approach.
We've seen that the Subgraph Directive Evaluation Approach is the simplest way to implement these directives, but it can lead to additional network requests and therefore degrade the performance of your federated GraphQL API.
We've looked at a more sophisticated approach that makes the Router aware of the special meaning of the @skip
and @include
directives. Although this approach reduces the number of network requests, it makes the Query Planner more complex and expensive to run.
Finally, we've introduced the Zero-Cost Abstraction, which in fact has a negative cost as it reduces the complexity of the Query Planner, Query Execution, as well as the Subgraphs themselves.
If you're keen to leverage the Zero-Cost Abstraction in your federated GraphQL API, check out the Cosmo Stack in the WunderGraph GitHub repository.
Cosmo Router is currently the only Router for federated GraphQL APIs that implements the Zero-Cost Normalization approach, and we're excited to see how it will help you to build the most performant federated Graphs. If you're interested in comparing Cosmo Router to your current solution, you can follow this Getting Started Guide to quickly set up a local development environment with Cosmo Router.
If you have any questions or feedback, feel free to reach out to me on Twitter or join our Discord Community .