Blog
/
Education

How to analyze the usage of your GraphQL Schema

cover
Jens Neuse

Jens Neuse

min read

There are multiple reasons why you might want to get insights into how your GraphQL Schema is being used.

  1. Based on the information on how your Schema is being used, you can optimize the API to better suit your clients' needs
  2. You can optimize your documentation around the most used fields, or add more documentation to fields that are very powerful but not used by many clients
  3. You can deprecate fields that are never used by any client
  4. You can safely remove fields that are deprecated and no longer used by any client without breaking them

That's four good reasons to analyze the usage of your GraphQL Schema. In this article, we'll show you how to do that.

We're hiring!

We're looking for Golang (Go) Developers, DevOps Engineers and Solution Architects who want to help us shape the future of Microservices, distributed systems, and APIs.

By working at WunderGraph, you'll have the opportunity to build the next generation of API and Microservices infrastructure. Our customer base ranges from small startups to well-known enterprises, allowing you to not just have an impact at scale, but also to build a network of industry professionals.

What is GraphQL Schema Usage?

GraphQL Schema Usage is the information about which fields of your GraphQL Schema are being used by which clients. It's a bit like Google Analytics for your GraphQL Schema.

More specifically it's a mapping of the following information:

  • Which fields are being used by which clients
  • How often are they being used
  • Which fields are being used together
  • Which (inline) fragments are being used
  • Which types are returned by which fields
  • What enums and enum values are being used
  • What input types and fields are being used

Prior work: Path-based Analytics for GraphQL APIs

We haven't found a lot of literature on this topic, which is why we decided to write this article. The only interesting content we found is a talk by our friends at Ingo who did a talk on this topic at the recent GraphQL Conf. You can watch the full talk by Eitan Joffe, the CTO of Inigo on Youtube . It's a fantastic talk on the topic of analytics for GraphQL APIs, and we highly recommend watching it.

In this talk, there were two slides that stood out to me, and I want to highlight them here. They were about "Path-based Analytics".

Here's an example of how the paths could look like:

In general, I really like the idea of path-based analytics, but we immediately knew that something is missing here.

How comes we immediately saw the problem? Well, we've previously used a path-based approach to associate data sources with fields. As we're building a GraphQL Router / Gateway, we needed a way to associate data sources (resolvers) with fields.

We've tried a path-based approach, so path-based analytics immediately looked familiar to us. But we've learned ourselves that there's a problem with this approach: Paths don't tell the whole story!

When we've implemented resolving abstract types, like Interfaces and Unions, we've learned that the path-based approach is not sufficient. E.g. with Interface Entities in federated Graphs, it's possible that not every Subgraph implements all possible types of an Interface. This means that you cannot plan your data fetching based on the path alone, you have to take into consideration the enclosing type as well.

The same is true for Analytics. It's not enought to look at the path alone, we have to analyze the types as well. But even that is not enough, at least not when we want to compute the complete Schema Usage for breaking change detection.

How to implement Schema Usage Analytics

Let's get to the meat of this article, how do we implement Schema Usage Analytics?

We can divide the implementation into three parts:

  1. Response Type Field Usage
  2. Field Arguments Usage
  3. Input Type Field Usage

Analyzing Response Type Field Usage for your GraphQL Schema

We've came up with a simple data structure to represent type field usage in the response. We record the EnclosingType, the FieldName, and the ReturnType of each field.

Here's a simple example:

1
2
3
4
5
6

This would result in the following TypeFieldUsage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Now let's add a bit more complexity and use an interface instead with a "character" root field.

1
2
3
4
5
6
7
8
9
10
11
12

This would result in the following TypeFieldUsage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

You can see that without the enclosing type, we would not be able to distinguish between the Droid and Human fields. More importantly, the homePlanet and primaryFunction fields might not be part of the Character Interface. It's also possible that both Droid and Human share some fields, so being able to distinguish between the enclosing types is crucial to correctly report the Schema Usage.

Analyzing Field Arguments Usage for your GraphQL Schema

Next, we need to analyze the usage of field arguments. This is a bit less complex than analyzing the response type field usage. The data structure to report argument usage is similar. We need to record the EnclosingType, the FieldName, the ArgumentName, and the ArgumentType.

Here's a simple example:

1
2
3
4
5
6

This would result in the following FieldArgumentUsage:

1
2
3
4
5
6
7
8

Let's add a more complex example with an input type:

1
2
3
4
5
6

This would result in the following FieldArgumentUsage:

1
2
3
4
5
6
7
8

Analyzing Input Type Field Usage for your GraphQL Schema

The last part we need to analyze is the usage of input type fields. This might seem simple, but it's actually a bit tricky.

Let's have a look at the following two GraphQL Operations to illustrate the problem:

Variant 1 with a variable definition and variables as JSON:

1
2
3
4
5
6
7
8

Variant 2 with the exact same input, but inline:

1
2
3

Both variants are semantically identical, but they look different. What we need to do is to normalize the Operations before we analyze them, allowing us to implement one algorithm to analyze the input type field usage.

The way we've implemented this is by normalizing the input type fields to JSON and turning all inputs into variable definitions. This allows us to analyze the input type field usage in a consistent way.

Once normalized, we can analyze the input type field usage in the same way as we've analyzed the response type field usage. There's one distinction though, some of the inputs are variable definitions, so they don't have an enclosing type, but a variable name instead. However, we're not interested in the variable name, as it's user defined. We just want to know the input type and what input fields are being used.

The result for our example Operation would look like this:

1
2
3
4
5
6
7
8
9
10
11
12

Implementing Breaking Change Detection with Schema Usage

Now that we've analyzed the usage of our GraphQL Schema, we can use this information to implement breaking change detection. In our case, we're storing this information in Clickhouse, but any other database would work as well.

Next, we need to store our current GraphQL Schema in a database as well. This is important because we want to be able to compare changes to the Schema or one of our Subgraphs to the last known good version. When making changes, we can use a library like GraphQL Inspector to compare the new Schema to the last known good version and find potential breaking changes.

Once we know about potential breaking changes, we can query our database to find out if any of the breaking changes would actually break any of our clients. Is the field, argument or input type field being used by any client? If not, we can safely make a breaking change, otherwise we need to think about migration strategies, etc...

Conclusion

In this article, we've shown you how to analyze the usage of your GraphQL Schema. We've also shown you how to use this information to implement breaking change detection. I hope you've learned something to be able to implement Schema Usage Analytics for your GraphQL Schema.

If you're looking at a complete solution that implements all of this, fully integrated into your favourite Continuous Integration Workflow and with a nice UI, you should check out Cosmo , the Open Source Full Lifecycle GraphQL API Management solution.