Blog
/
Education

Announcing Field Level Authorization for GraphQL Federation with Cosmo Router

cover
Jens Neuse

Jens Neuse

min read

Today, we're excited to announce that we've added field-level authorization to our Open Source GraphQL Federation Router. This allows you to use a policy-as-code workflow to control access to your federated GraphQL APIs. Most importantly, this gives you a central place to manage your authorization logic instead of having to implement it in every service.

Let's take a look at how this works, the implementation details, and how you can get started. Before we dive in, let's start with a quick refresher on Authentication and Authorization.

Authentication vs. Authorization

Authentication is the process of verifying that an agent is who they claim to be. I'm saying agent here because it could be a person, a service, or a device. A common way of validating an agent's identity is by validating a JWT token. What this tells us is that the agent is who they claim to be, and that we can trust them.

In short, authentication tells us "who" the agent is. But are they allowed to perform the action they're trying to perform?

Let's say an agent is a 3rd party service that one of our users has authorized to access their data, e.g. through an OAuth2 flow. This service is now trying to access the user profile of Bob. Did Bob allow this service to access his profile? This is where authorization comes into play.

Authorization is the process of verifying that an agent is allowed to perform a specific action. In our example, we could have defined a policy that says that an agent needs to have the read:user_profile scope to access a user's profile. If the agent has this scope, we can allow them to access the user's profile.

You might remember that when you're on a website and you're asked to allow a 3rd party service to access some specific data from your Facebook or Google account, that's actually an OAuth2 flow behind the scenes that asks you to grant specific scopes to the 3rd party service.

In that sense, authorization tells us "what" the agent is allowed to do.

But what are scopes exactly? How do we define them? And how does this relate to GraphQL and Federation?

Field-Level Authorization with Scopes for GraphQL Federation

In contrast to REST APIs, GraphQL APIs are not resource-oriented. For a REST API, you might have a resource called /user that has a GET endpoint to retrieve a user's profile. If you want to restrict access to this endpoint, you can use a scope like read:user_profile to control access to this endpoint. The scope requirement would be attached to the endpoint (/user).

In GraphQL, we don't have endpoints, we have a Schema with Types and Fields. So instead of attaching a scope to an endpoint, we need to attach it to a field, or a Coordinate to be more specific.

A Coordinate is a combination of a Type and a Field. A field is always part of a type (or interface), so we can use the type name and the field name to identify a specific field (Coordinate). A field alone is not enough to identify a specific field.

Let's take a look at an example:

1
2
3
4
5
6
7
8
9

In this example, we have a Coordinate that consists of the type Employee and the field startDate. We've added the @requiresScopes directive to this field to define the required scopes to access this field.

The scopes read as follows:

The agent needs to have either the read:employee and read:private scopes, or the read:all scope to access the startDate field. The outer list of scopes is OR-joined, while the inner list of scopes is AND-joined.

You can also have scopes defined on root fields, e.g. the Query or Mutation type:

1
2
3
4
5

In this example, you need the read:fact or read:all scope to access the topSecretFederationFacts field.

Scopes can also be defined on Scalar fields:

1
2
3
4
5
6
7

In this case, you need the read:scalar or read:all scope to access the description field. Scopes on Scalar fields bubble up to all fields that use this Scalar.

You might have noticed that we've also added a @authenticated directive to the DirectiveFact type. Directives applied to a type are applied to all fields of this type. This means that you need to be authenticated to access any field of the DirectiveFact type.

Directives can also be applied to Enums:

1
2
3
4
5
6
7
8
9
10

In this example, you need to be authenticated to access the factType field of any type that implements the TopSecretFact interface. That's another important thing to note: Directives applied to an interface are applied to all fields of all types that implement this interface.

You can also define scopes on Type Definitions:

1
2
3
4
5

In this example, you need the read:entity scope to access the EntityFact type.

Here's another example highlighting what happens if we define scopes on the Type Definition and a different scope on the field:

1
2
3
4
5

In this case, we AND-join the scopes using matrix multiplication. This means that you need to have both the read:entity and read:scalar scopes to access the description field.

Let's make this a bit more complex by having two scopes on the Type Definition and two scopes on the field:

1
2
3
4
5

In this case, the agent needs to have one of four combinations of scopes to access the description field:

  1. read:entity and read:scalar
  2. read:entity and read:description
  3. read:all and read:scalar
  4. read:all and read:description

How Policy-as-Code for GraphQL Federation improves Transparency and Developer Experience

This is a very powerful feature that allows you to define very granular access control policies. But not only that, it also allows you to follow a policy-as-code workflow. Instead of "hiding" your authorization logic in your (micro-)services, they become part of your schema, making them visible to everyone else.

By using a policy-as-code workflow, you can make authorization transparent and auditable. If you "implicitly" define a security policy in a Subgraph resolver, how do you audit that? In a Microservice architecture, how would you know which security policies exist, where they are defined, and how they are implemented?

If you want to define scopes across a Microservice architecture, how do you coordinate that? How do you make sure that you don't have overlapping scopes? What if you have conflicting scopes?

With a policy-as-code workflow, we can leverage the GraphQL Schema to define our security policies. This makes them transparent and auditable. We can merge scopes across all Subgraphs and apply validation rules to make sure that our scopes (and the rest of the Schema) merge correctly. We can also use the Schema to generate documentation for our security policies.

All of this is not possible if we treat authorization as an implementation detail of our services.

How Policy-as-Code for GraphQL Federation improves Security and Compliance

If we implement authorization in our services, we have to ensure that all Subgraphs implement security policies correctly. This is a very error-prone process, requiring a lot of coordination, code reviews, and testing. If we miss a security policy in one of our Subgraphs, we have a security vulnerability. But how do we even know that a field is unprotected when that's not visible in the Schema?

A policy-as-code workflow allows us to create a governance process around our security policies. We can use the Schema to generate a list of all unprotected fields. We can have linting rules that prevent us from merging a Subgraph that has unprotected fields. We can have CI checks that prevent us from accidentally merging unprotected fields.

Without such a workflow, we have to invent our own process, tooling, and test-suite to ensure that our Schema is protected.

How Policy-as-Code for GraphQL Federation makes it easier to Audit and Monitor Access

If an Auditor wants to know who has access to a specific field, how do we answer that question? If we implement authorization in our services, we have to look at all Subgraphs and check if they have a resolver for this field. We would then have to look at the resolver code to see if it has any authorization logic. If it does, we would have to look at the code to see what the authorization logic is.

Compare that to simply looking at the Schema to see if the field has any scopes defined. The Cosmo Router has a test-suite that ensures that scopes defined in the Schema are validated correctly. There's no need to look at any code to understand the authorization logic.

Authorization for Root Fields vs filtering nested fields

Another important aspect of authorization is to understand the difference between preventing an agent from calling a root field vs filtering out a result.

Authorization rules can be applied in two different ways. We can either prevent an agent from calling a field at all, or we can allow them to call the field, but filter out the result.

In case of a Query, there's nothing harmful about calling a field and filtering out the result. However, in case of a Mutation, we want to prevent an agent from calling a field if they don't have the required scopes. In contrast to Queries, Mutations can have side-effects, so we want to prevent an agent from calling a Mutation if they don't have the required scopes.

For that reason, we have implemented two different ways of applying authorization rules to root fields vs nested fields.

For root fields, we collect all required scopes and attach them to the "fetch" request which is sent to the Subgraph. If the agent doesn't have the required scopes for this fetch request, we don't send it to the Subgraph and return an error instead.

For nested fields, we collect all required scopes and attach them to the fields in the execution plan. We execute all fetches and then filter out the results based on the scopes attached to the fields. If a scope is missing, we set the field to null and add an error to the response. If the field is non-nullable, we bubble up the error to the nearest nullable parent field. If no parent field is nullable, we set the data field in the root of the response to null.

Here's an example of a response where we have filtered out a field and added an error:

1
2
3
4
5
6
7
8
9
10
11
12
13

As you can see, we have filtered out the description field and added an error to the response. The error message states clearly what the required scopes are and why access was denied. As the field is non-nullable, we bubbled up the error.

Returning Partial Data with Field-Level Authorization vs failing the whole request

Another question that comes up is whether we should return partial data or fail the whole request if the agent doesn't have the required scopes. This depends on the use-case and can be adjusted on the Router level.

By default, the Router allows returning partial data, like in this example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

In this example, we're returning an error for the factTypes field, but we're still returning the productTypes field.

Merging Scopes across Subgraphs in GraphQL Federation

What happens if we have multiple Subgraphs that define scopes for the same field? Let's take a look at the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13

In this example, we have two Subgraphs that define scopes for the startDate field. Subgraph 1 defines the scopes read:employee and read:private, while Subgraph 2 defines the scope read:all. If we merge these two Subgraphs, we combine the scopes similarly to how we combine scopes on a Type Definition and a field, using matrix multiplication.

The result of merging these two Subgraphs would be the following:

1
2
3
4

How we differ from Apollo Federation and why we believe that Open Federation is important

We've carefully analyzed how Apollo implemented this behavior and decided to take a different approach. With Apollo Federation, scopes from different Subgraphs merge using OR-joins. This means that in the example above, the resulting Schema would look like this:

1
2
3
4

Let me explain why we think that this is not the correct behavior.

The team of Subgraph 1 believes that an agent needs to have the read:employee and read:private scopes to access the startDate field. Meanwhile, the team of Subgraph 2 believes that an agent needs to have the read:all scope to access the startDate field. If we merge these two Subgraphs using an OR-join, we're effectively saying that an agent needs to have either the read:employee and read:private scopes, or the read:all scope to access the startDate field. As a result, both teams think that their security policy is implemented correctly, but in reality, the OR-join has unknowingly disabled the intended policy of the other team.

Our understanding of GraphQL Federation is that teams should be able to define their Subgraph Schemas independently, including their security policies. A security policy defined by one team should not be able to disable the security policy of another team, at least not without the other team's knowledge.

This is why we think it's important to have Open Federation . We believe that it's not enough to define the Federation directives alone. We need a formal specification that defines the behavior of the Federation directives, how they interact with each other, and how they merge across Subgraphs.

We're currently working on this formal specification and will publish it soon. Please subscribe to the updates on the Open Federation website to get notified when we publish an update.

Please note that we're by no means trying to discredit the Apollo team or their work. We're also not saying that our approach is right. Our goal is to start a discussion about this topic and to get feedback from other Federation users. It's possible that we're missing something and that we need to adjust our approach. By publishing our understanding of Federation in the "Open", we hope to get feedback from the community and to improve Federation for everyone, hence the name "Open Federation".

How to get started with Field-Level Authorization for GraphQL Federation with Cosmo Router

This feature is implemented in the latest version of the Cosmo Router. There's no special license required, it's available in the Open Source version of the Router.

To get started, you can follow the Cosmo Documentation for the two added directives @requiresScopes and @authenticated .

As a prerequisite, make sure that you've got Authentication set up correctly.

Conclusion

With field-level authorization, we've added another important feature to the Cosmo Router, making it the most advanced Open Source GraphQL Federation Router available today.

We believe that a policy-as-code workflow is a great way to improve transparency, security, and compliance, but also helps to scale security concerns in a Microservice architecture.

Implicitly defining security policies hidden in your Subgraphs works fine with a small number of Subgraphs, but the burden of managing these invisible security policies grows as you expand your architecture to more services.

What's next?

This was really just the beginning of our journey to make GraphQL Federation more secure and compliant. In the future, we want to add the possibility to define custom authorization policies. These can either be implemented using custom code in your Subgraphs, or by using a more declarative approach, e.g. by using OPA (Open Policy Agent).

Join the discussion

If you're interested in this topic, please join our Discord Server to start a discussion.