Blog
/
Education

Quirks of GraphQL Subscriptions: SSE, WebSockets, Hasura, Apollo Federation / Supergraph

cover
Jens NeuseYuri Buerov

Jens Neuse & Yuri Buerov

min read

Cosmo: Full Lifecycle GraphQL API Management

Are you looking for an Open Source Graph Manager? Cosmo is the most complete solution including Schema Registry, Router, Studio, Metrics, Analytics, Distributed Tracing, Breaking Change detection and more.

You might be thinking that there's not much to talk about Subscriptions. They are defined in the GraphQL specification, it should be very clear how they work and what they are for.

But in reality, the specification doesn't say much about the transport layer. In fact, it doesn't even specify a transport layer at all. On the one hand, this is an advantage, because you can use GraphQL in all sorts of environments. On the other hand, we now have at least five different implementations of GraphQL Subscriptions.

What this means is that you cannot just use any GraphQL client, connect to a GraphQL server and expect it to work. You have to know which protocol the server supports and which client you need to use. Is that an ideal situation? Probably not, but we're about to change that!

Cosmo: Full Lifecycle GraphQL API Management

Are you looking for an Open Source Graph Manager? Cosmo is the most complete solution including Schema Registry, Router, Studio, Metrics, Analytics, Distributed Tracing, Breaking Change detection and more.

We're the creators of WunderGraph (open source) , the first cloud native Serverless GraphQL API Gateway. One of the challenges we ran into was to support all the different GraphQL Subscription protocols out there. As the GraphQL specification is strictly protocol agnostic, different protocols have been developed over the years.

If a client wants to consume a GraphQL Subscription, it needs to know which protocol to use, and implement the client side of that protocol.

With our Open Source API Gateway, we're going one step ahead and unify all of them under one roof. If you're looking at using GraphQL Subscriptions in your project, this post is a great way to get a quick overview of the different protocols and their quirks.

Introduction - What are GraphQL Subscriptions?

GraphQL has three types of operations: Queries, Mutations, and Subscriptions. Queries and Mutations are used to fetch and modify data. Subscriptions are used to subscribe to data changes.

Instead of polling the server for an update, subscriptions allow the client to subscribe to data changes, e.g. by subscribing to a chat room. Whenever a new message is posted to the chat room, the server will push a message to the client.

With Queries and Mutations, flow control is in the hands of the client. The client sends a request to the server and waits for a response. With Subscriptions, flow control is in the hands of the server.

Here's an example of a GraphQL Subscription:

1
2
3
4
5
6

The server will now send a continuous stream of messages to the client. Here's an example with 2 messages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Now that we understand what GraphQL Subscriptions are, let's take a look at the different protocols that are available.

GraphQL Subscriptions over WebSockets

The most widely used transport layer for GraphQL Subscriptions is WebSockets. WebSockets are a bidirectional communication protocol. They allow the client and the server to send messages to each other at any time.

There are two implementations for GraphQL Subscriptions over WebSockets:

The first is subscription-transport-ws by Apollo, the second one is graphql-ws by Denis Badurina.

Both protocols are quite similar, although there are some minor differences. It's important to note that the Apollo protocol is deprecated in favor of graphql-ws, but it's still widely used.

GraphQL Subscriptions over WebSockets: subscription-transport-ws vs graphql-ws

Both transports use JSON as the message format. To uniquely identify the type of message, a type field is used. Individual subscriptions are identified by an id field.

Clients initiate the connection by sending a connection_init message, which is followed by a connection_ack message from the server.

1
2

I personally find this weird. It feels like we're creating multiple layers of TCP. To create a WebSocket connection, we need to create a TCP connection first. A TCP connection is initiated by sending a SYN packet, which is followed by an ACK packet from the server. So, there's already a handshake between the client and the server.

Next, we initiate the WebSocket connection by making a HTTP Upgrade request, which the server accepts by sending an HTTP Upgrade response. That's the second handshake between the client and the server.

Why do we need a third handshake? Do we not trust the WebSocket protocol enough?

Anyway, after these three handshakes, we're finally ready to send messages. But before we talk about starting and stopping subscriptions, we have to make sure that our WebSocket connection is still alive. When it comes to heartbeats, there are a few differences between the two protocols.

The Apollo protocol uses a {"type": "ka"} message to send a heartbeat from the server to the client. What the protocol is lacking is a definition of how the client should react. If the server sends a keep alive message to the client, but the client never responds, what's the purpose of the keep alive message? But there's another problem. The protocol states that the server should only start sending keep alive messages after the connection is "acked". In practice, we found that Hasura might send keep alive messages before the connection is acked. So, if your implementation relies on a strict order of messages, you should be aware of this.

The graphql-ws protocol improved on this. Instead of a single keep alive message, it defines that the server should send a {"type":"ping"} message periodically, to which the client should respond with a {"type":"pong"} message. This ensures for both client and server that the other party is still alive.

Next, let's talk about starting a subscription. With the Apollo protocol, we've had to send the following message:

1

The type is start and we have to specify an id to uniquely identify the subscription. The subscription is sent as query field on the payload object. I think this is confusing and stems from the fact that a lot of people in the GraphQL community call operations queries. It gets even more confusing, because even though the "GraphQL Operation" is called a query, you have to supply an operationName field in case you've got multiple named operations in your document.

Unfortunately, the graphql-ws protocol did not improve on this. I'm assuming that's because they want to stay aligned with GraphQL over HTTP specification, a spec that tries to unify the way GraphQL is used over HTTP.

Anyway, here's how we would start a subscription with the graphql-ws protocol:

1

The start type was replaced with subscribe, the rest stayed the same.

Once we have initiated the connection and started a subscription, we should now dig into how receiving messages works.

For subscription messages, the Apollo protocol uses the data type, alongside the id of the subscription, with the actual data being sent in the payload field.

1

The graphql-ws protocol uses the next type for subscription messages, the rest of the message stays the same.

1

Now that we've got the subscription initiated, we might want to stop it at some point.

The Apollo protocol uses the stop type for this. If the client wants to stop a subscription, it sends a stop message with the id of the subscription.

1

The graphql-ws protocol simplified this. Both client and server can send a complete message with the id of the subscription to stop it or notify the other party that the subscription has been stopped.

1

With the Apollo protocol on the other hand, the complete message was only used by the server to notify the client that the subscription has been stopped and there's no more data to be sent.

That was a quick overview of the differences between the two protocols. But how does a client actually know which protocol to use, or how can a server learn which protocol a client is using?

This is where content negotiation comes into play. When a client initiates a WebSocket connection, it can send a list of supported protocols in the Sec-WebSocket-Protocol header. The server can then choose one of the protocols and send it back in the Sec-WebSocket-Protocol header of the HTTP Upgrade response.

Here's how such an upgrade request might look like:

1
2
3
4
5
6
7

And here's how the server might respond:

1
2
3
4
5

That's the theory. But does this actually work in practice? The simple answer is no, but I think it's worth digging into this a bit more.

GraphQL client and server implementations don't usually support content negotiation. The reason being that for a long time, there was just one protocol, so there was no need to negotiate. Now that there are multiple protocols, it's too late to add support for content negotiation to existing implementations.

What this means is that even if a client sends a list of supported protocols, the server might just ignore it and use the protocol it supports. Or, even worse, the server might select the first protocol in the list, even if it doesn't support it, and then act as if the second protocol was selected.

So what you need to do is to somehow "Fingerprint" client and server to understand which protocol it supports. Another option would be to "just try" and see which protocol works. It's not ideal, but that's what we've got to work with.

It would be nice if we had something like an OPTIONS request for GraphQL servers, so that client and server can learn about each other to pick the right protocol. But we'll pick up on this later.

For now, let's sum up the complete flow of the two protocols. Let's start with the Apollo protocol.

1
2
3
4
5
6
7

For comparison, here's the subscriptions-transport-ws flow:

1
2
3
4
5
6

Multiplexing GraphQL Subscriptions over WebSocket

What's great about the two protocols is that they both support multiplexing multiple Subscriptions over a single WebSocket connection. This means that we can send multiple subscriptions over the same connection and receive multiple subscription messages on the same connection. At the same time, this is also a huge drawback, because multiplexing is implemented in the application layer.

When you're implementing a GraphQL server or client that uses WebSockets, you have to implement multiplexing yourself. Wouldn't it be much better if the transport layer would handle this for us? Well, it turns out that there is a protocol that does exactly that.

GraphQL over Server-Sent Events (SSE)

The Server-Sent Events protocol is a transport layer protocol that allows a client to receive events from a server. It's a very simple protocol that's built on top of HTTP. Together with HTTP/2 and HTTP/3, it's one of the most efficient protocols for sending events from a server to a client. Most importantly, it solves the problem of multiplexing multiple Subscriptions over a single connection at the transport layer. This means, that the application layer doesn't have to worry about multiplexing anymore.

Let's take a look at how the protocol works, by looking at the implementation from GraphQL Yoga:

1
2
3
4
5
6
7
8
9
10
11
12
13

It's no accident that we're using curl here. The Server-Sent Events protocol is a transport layer protocol that's built on top of HTTP. It's so simple that it can be used with any HTTP client that supports streaming, like curl. The GraphQL Subscription is sent as a URL encoded query parameter.

The Subscription starts when the client connects to the server, and it ends when either the client or the server closes the connection. With HTTP/2 and HTTP/3, the same TCP connection can be reused for multiple Subscriptions. That's multiplexing at the transport layer.

If a client doesn't support HTTP/2, it can still use chunked encoding over HTTP/1.1 as a fallback.

In fact, this protocol is so simple that we don't even have to explain it.

Proxying GraphQL Subscriptions through an SSE "Gateway"

As we've just shown, the Server-Sent Events approach is by far the simplest approach. That's also why we've chosen it for WunderGraph as the primary way of exposing Subscriptions and Live Queries.

But how do you unify multiple GraphQL servers with different Subscription protocols under a single API? That's what the last part of this post will be about...

Multiplexing multiple GraphQL Subscriptions over a single WebSocket connection

We've previously discussed how the WebSocket protocols support multiplexing multiple Subscriptions over a single WebSocket connection. This makes sense for a client, but gets a bit more complicated when doing it in a Proxy/API Gateway.

We can't just use the same WebSocket connection for all Subscriptions, because we have to handle authentication and authorization for each Subscription.

So, instead of using one single WebSocket connection for all Subscriptions, we have to "bundle" all Subscriptions together that should be executed in the same "security context". The way we handle this is by hashing all security-related information, like Headers, origin, etc. to create a unique identifier for each security context.

If a WebSocket connection for this hash exists already, we use it. Otherwise, we create a new WebSocket connection for this security context.

Authentication for GraphQL Subscriptions over WebSockets

Some GraphQL APIs, like the one from Reddit expects that the client sends an Authorization header with the WebSocket connection. This is a bit problematic, because Browsers cannot send custom headers with WebSocket Upgrade requests, the Browser API just doesn't support it.

So, how does Reddit handle this? Go to e.g. reddit.com/r/graphql and open the developer tools. If you filter the connections by websocket ("ws"), you should see a WebSocket connection to wss://gql-realtime.reddit.com/query.

If you look at the first message, you'll see that it's a connection_init with some special content:

1

The client sends the "Authorization header" as part of the payload of the connection_init message. We've asked ourselves how we can implement this, not knowing what kind of message you'd like to send to the origin in the connection_init message. Reddit sends a Bearer Token as the Authorization field, but you might want to send some other information.

So, we've decided to allow our users to define a custom hook that can modify the payload of the connection_init message however they want. Here's an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

This hook takes the Authorization header from the client request (SSE) and injects it into the connection_init message payload.

This doesn't just simplify authentication for WebSocket Subscriptions, but it also makes the implementation much more secure.

The Reddit implementation exposes a Bearer Token to the client. This means that the Javascript client in the browser has access to the Bearer Token. This token might get lost, or could be accessed by malicious Javascript code that got injected into the page.

Not so with the SSE implementation. We're not exposing any token to the client. Instead, the identity of the user is stored in an encrypted, http only cookie.

Manipulating/filtering GraphQL Subscription messages

Another problem you might run into is that you want to manipulate/filter the messages that are sent to the client. You might want to integrate a 3rd party GraphQL API, and before sending the messages to the client, you want to filter out some fields that contain sensitive information.

We've implemented a hook for this as well:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

With four hooks available in your toolbox, you're able to manipulate the Subscription message before it's sent to the origin, and before each response is sent to the client.

The most interesting hook might be the mutatingPostResolve hook, as it allows you the filtering and response manipulation that we've talked about earlier.

Proxying GraphQL Subscriptions to Federated GraphQL APIs (Apollo Federation / Supergraph / Subgraph)

Proxying GraphQL Subscriptions to Federated GraphQL APIs adds a whole new level of complexity to the problem. You have to start a subscription for the root field on one of the subgraphs, and then "join" the response from one or more subgraphs into a single message.

If you're interested to check out an example of how this works, check out the Apollo Federation Example in our monorepo.

I'll do a more detailed writeup on this topic in the future, but for now, let me give you a quick overview of how this works.

We break down the federated GraphQL Subscription into multiple operations, one Subscription for the root field, and one or more Queries for the rest of the response tree.

We then execute the Subscription like any other subscription, and switch into "regular" execution mode as soon as a new subscription message arrives from the root field.

This also allows us to use the "magic" _join field to join a subscription with a REST API or any other data source.

Once you've figured out the part of managing multiple WebSocket connections, the rest is just a matter of joining the responses from the different data sources, be it federated or non-federated GraphQL APIs, REST APIs or even gRPC.

Examples

This was quite a lot to digest, so let's take a look at some examples to make it a bit more concrete.

WunderGraph as an API Gateway in front of Hasura

This Example shows how to use WunderGraph in front of Hasura.

WunderGraph with graphql-ws-subscriptions

The next Example combines graphql-ws-subscriptions with WunderGraph.

WunderGraph with Apollo GraphQL Subscriptions

If you're still using the legacy Apollo GraphQL Subscriptions, we've got you covered as well.

WunderGraph and GraphQL SSE Subscriptions

This Example uses the SSE implementation of GraphQL Subscriptions.

WunderGraph with GraphQL Yoga Subscriptions

One of the most popular GraphQL libraries, GraphQL Yoga should definitely be on the list as well.

WunderGraph Subscription Hooks Example

Finally, we'd like to round it off with a WunderGraph Subscription Hooks Example , demonstrating the different hooks that are available.

Conclusion

As you've learned, there's quite some complexity involved in understanding and implementing all the different GraphQL Subscription protocols.

I think, what's really missing in the GraphQL community is that we standardize on a "GraphQL Server capabilities" protocol. This protocol would allow a client to quickly determine which capabilities a GraphQL server has, and which protocols it supports.

In its current state, it's not always guaranteed that a GraphQL client can automatically determine how to talk to a GraphQL server. If we want to make the GraphQL ecosystem grow, we should establish standards so that clients can talk to GraphQL servers without human intervention.

If you're trying to unify multiple GraphQL APIs under one umbrella, you've probably run into the same problems that we've had. We hope that we were able to give you some hints on how to solve these problems.

And of course, if you're just looking for a ready-made programmable GraphQL API gateway that handles all the complexity for you, check out the examples above and give WunderGraph a try. It's Open Source (Apache 2.0) and free to use.