Cosmo: Full Lifecycle GraphQL API Management
Are you looking for an Open Source Graph Manager? Cosmo is the most complete solution including Schema Registry, Router, Studio, Metrics, Analytics, Distributed Tracing, Breaking Change detection and more.
Generating APIs is becoming more and more popular. Especially the GraphQL community is "suffering" from this trend. Production-ready APIs with authentication and caching in minutes, define the schema, and you're almost done. As we all know, everything comes at a cost. This post takes a critical look at generated GraphQL APIs, the benefits as well as the tradeoffs you're going to make.
History and Trends
Generated GraphQL APIs look like a hot trend nowadays. Some venture backed companies push this topic hard to make it mainstream. We hear praises of the amazing developer experience and how productive people are. However, is this really a sustainable business?
If we look back a few years you might remember the term "Backend as a Service" (BaaS), established by companies like Parse. The impact of parse, nowadays seems to be diminished. A lot of others jumped on the BaaS train, but except Firebase, you don't hear much on the topic nowadays.
Then came GraphQL and companies tried the same thing again. If we look at the graveyard of failed companies you might find some familiar names like Reindex, Graphcool, etc...
Today, we have tools like Appsync and Hasura leading the pack, followed by many smaller companies who imitate them and try to diversify in various ways.
Appsync might be an outlier because their goal is not only BaaS but making AWS products easy to integrate, which is a different value proposition than their competition.
Another emerging trend is (Serverless) Databases as a Service. dgraph offers a native GraphQL Database. FaunaDB offers a Serverless Database which also comes with a GraphQL interface. Then there's edgeDB which also offers a GraphQL interface. I guess if you search for a while you'll find more.
Another quite popular category is Content Management Systems with GraphQL APIs. The distinction from just BaaS is that these tools allow non-technical people to create content while developers can consume the content via GraphQL APIs.
How comes that the idea of BaaS/generated APIs keeps coming up while most of the implementations fail?
We might be able to look back in time to better understand why this approach is failing. If you ask Prisma, the makers of Graphcool, they refer to this tweet:
“Exposing the database schema 1:1 via a generated graphql api totally misses the point of graphql - or an api as an intended abstraction.”
We'll come back to this citation later.
Developer experience of generated APIs
One feature of generated APIs you cannot deny is the developer experience. If you're an experienced Full-Stack developer you're able to start new projects in minutes. Spin up a database, define the schema, API done. You're now able to start building the frontend.
Some of the services allow you to build the schema using a user interface only. This allows even less experienced Frontend developers to build their own "APIs". You click a few buttons and your API with authentication is ready to use.
But what about maintenance? How do you evolve applications built using this approach? What if you don't just have one API consumer?
Let's have a look at possible use cases.
Use cases of generated APIs
Generating APIs can be an amazing time-saver if you're a single developer, working on a small project. You don't have to expose the API to other developers. It's just you using them. You understand the tradeoffs, e.g. leaking some business logic into the client. You're still happy because you saved a lot of time not having to build a backend.
Another good example where generated APIs can be very helpful is prototyping. You want to bring an idea to life. You don't care about the quality of your API. All you need is some CRUD methods to demonstrate an application to a prospect.
What about larger teams and projects? Are generated APIs also a good fit for enterprise use cases?
In order to answer this question, let's first have a look at what an API is.
What is an API?
I'll not go into much detail of this topic, as I want to stay focused. If you're interested to learn more about what an API is, I can highly recommend Erik Wilde talking about the topic .
Too long, didn't watch:
An API is an interface, abstracting away an implementation of a contract. In Eriks example, he explains how a weather forecast company exposes their forecasts through an API to other developers.
GraphQL is a Query language, offering one way of implementing APIs.
APIs make integrations scalable because we can rely on standards and don't have to understand every detail of a system. E.g. if an API speaks GraphQL, clients know already how to interact with the system. We can understand the API by issuing an introspection Query and from there on use the Queries and Mutations defined in the GraphQL schema.
Without tools like GraphQL, we had to invent custom contracts and integration tooling for every API.
One important aspect of the success of an API is its design. APIs must be able to cover common use cases of API consumers. E.g. a weather API is useless if it's almost impossible to get forecasts for the next few days in a very easy way.
Next, we'll look into API design to further understand why generated APIs don't scale.
On good API design
Going back to the weather forecast service, we talked about success factors of the API. An API can also be seen as a product as it serves a specific purpose.
How can a weather forecast API be successful? As with any other product, you have to talk to your users and adapt to their needs. You have to re-shape your product to serve your customers well.
In case of an API, re-shaping means finding a good API design.
And with that, we're at the core of the problem of generated APIs.
It's impossible to design an API when all you do is deriving it from e.g. database tables. A good API is not just making raw data accessible. A good API is driven by use cases, making it easy to integrate. It doesn't have to be a public facing API. If you're providing an API to other developers it should be designed around their use-cases. Deriving APIs from database schemas, tables or whatever it is means, you're not designing, you're not focussing on your users.
The consequences are well described by Erik:
“If you're designing an API by avoiding designing it, you end up with the design you deserve.”
If you skip the design phase, you'll end up with unhappy users. If your company depends on the success of your APIs, this could be a deal breaker.
Imagine you're an engineer and your task is to build a new steering wheel for a car. You'll take the whole description of the CAN-BUS (internal communication system) description of the car and "generate" a panel of knobs from it. This steering wheel might have thousands of buttons and is barely usable.
Wouldn't it make more sense to figure out how people would want to steer a car? What knobs they need on it to control the radio without losing control of the car? Once you have figured out the requirements you could then start mocking a few designs. With those designs you'd then go to users and test if they work they way you expect. You'll then iterate your ideas until you find a good design. Only then, you'll actually build a real steering wheel. One that you're confident will solve the problem of your user.
What's different from a steering wheel and the API of your service? Why is it not obvious that generated APIs don't solve the problem.
You'd be much faster generating a steering wheel because you don't have to spend time designing and implementing it. If you're the only user of your steering wheel, this might work because you know all about CAN-BUS systems. However, this approach won't scale if you want to build APIs for other teams and companies.
Information hiding principle
Another important aspect of APIs is abstraction and information hiding.
API consumers should not have to think about the implementation. But what if the API is leaking implementation details?
Let's have a look at this simple example:
Even if the language is GraphQL, it's clear that we're talking to a relational database and SQL is used behind the scenes.
Should an API consumer have to think about primary keys? Isn't that a concern of storing data, a completely different layer?
How about JSON selectors as arguments?
Isn't the implementation leaking information on how data is being stored?
Here's another example where GraphQL is being abused as an ORM.
The consequence of these examples is that it's almost impossible to make changes to the implementation.
Developers who know some basics on databases might see another problem with this approach. In case of complex WHERE clauses (hello SQL) or aggregations, you have to make sure that the required indices exist on the database. Otherwise, you'll run into full table scan problems.
I think it's clear that all these examples violate the principle of information hiding.
In the very beginning, we were looking at the developer experience of generated APIs and how much time you can save by not implementing the API but generating it. We then learned that generated APIs generally don't scale well beyond personal projects because they are not driven by use cases.
Now, let's have a look at what actually makes a good API implementation.
A good API implementation decouples the API contract from data storage and other aspects like e.g. authentication. A good API implementation allows you to change how you store data without breaking the API contract. A good API also allows you to easily evolve and extend the contract for new use cases. These three requirements are impossible to accomplish with generated APIs that are tightly coupled to infrastructure.
But there's one additional component that tops all of them: Business logic or state machines. A good API doesn't force API consumers to implement their own state machine to interpret the data of the API. Forcing API consumers to interpret raw data and derive state is prone to errors and should be avoided.
Let's have a closer look at state machines.
State machines and business logic
Going back to the weather forecast scenario, the API might want to indicate if a forecast is valid.
At some point in time a developer decides to add two columns to the forecast table, valid_from and valid_to, both are date fields.
The API simply exposes both columns as we're using a generated API.
As an API consumer you can now query the forecast.
The client could now use their own time and calculate if the forecast is valid.
The first problem with this approach is that we force all clients to implement this logic, which is not user friendly.
What about time zones? What about date formats?
What if the product owner decides that forecasts are valid for "windows" in time?
The developer might change valid_from and valid_to to into an array of intervals.
The Query might then look like this:
The client would now have to iterate the list of "valid_from_tos" and calculate if their client with their timezone is in one of the windows.
What if we would not use a generated API? How could the API look like?
In this case, we've moved the business logic into the API implementation. This makes our API easier to use and the integration less error prone.
So, the question arises what really is the value of generating GraphQL APIs?
GraphQL as an ORM?
So far, we've learned about many of the success factors of good APIs. It seems clear that generating APIs results in a lot of problems.
If you confront vendors with these problems they propose that you could use generated GraphQL APIs internally to implement external APIs. In that sense, we'd use the generated GraphQL API as an ORM to talk to a database. It's an interesting approach, but it doesn't seem like this is what vendors are actually trying to sell.
One the one hand, it could be very convenient to use GraphQL as a language to talk to multiple databases. On the other hand, GraphQL is lacking features that would make this approach really appealing.
There's a reason why FaunaDB has FQL and dgraph has GraphQL+-. There's a reason why the people behind Graphcool pivoted away from building generated GraphQL APIs on top of databases. They are now building Prisma, an ORM. I don't yet see the business model behind building an ORM but that's another story. The fact that matters is that they abandoned their initial idea and you should ask yourself why we're repeating history.
Summary of Generated APIs
- works well for small projects and prototyping
- lacks capabilities to design
- violates information hiding
- forces business logic onto API consumers
- doesn't abstract away storage
- is hard to evolve because of tight coupling
Tightly coupling as a service
Now let's have a look at the subtitle of this post.
More and more vendors are providing services that more or less expose the database as a GraphQL service. Although they might say, you can use their services just "as a database" internally, most of them are clearly advertising that you can and should use their service directly from clients. This intention gets underscored by them providing easy ways for web clients to authenticate and authorize users.
Why isn't it enough to provide a well-designed GraphQL API for internal usage? Why are you creating all these tutorials, forcing your users to establish tight coupling between your service and their applications? Young developers with a focus on frontend development might not yet realise the tradeoffs they are making. If that's not enough, by adding role based access out of the box, it becomes almost impossible to eject from a service like this.
API clients don't just depend on a specific database for data but also for identification of users and their roles.
There's a reason why tools like OAuth2 and OpenID Connect (OIDC) emerged. Both themselves are APIs, abstracting away the details of how to authorize and authenticate users. This allows different providers to offer their implementation of the API contract, e.g. Okta, Auth0, etc... As a user of such tools you're now free to switch vendors if all you're relying on is OIDC for authentication.
Relying on standards means you can choose between implementations. It also means it's easy to find developers with experience to use these standards if they are widely used. Keep that in mind when choosing a SaaS with a proprietary auth implementation.
Relying on a SaaS provider is a risk
Graphcool shut down their service because they realised they cannot scale their company in the way they want.
Other vendors might as well realise that generated APIs have their limitations and building a sustainable business with that in mind is hard.
You should be aware of the risk of using such a provider.
At any time, they might announce that they discontinue their service when they realise that it's not a sustainable business model.
VC's might as well understand that they've bet on the wrong trend and pull the plug of some companies.
We've looked at the history of generated APIs.
We've recognized that generated APIs are amazing for prototyping and small projects. They can be a tremendous time saver in the beginning of a project. However, we also acknowledged that generated APIs are hard to scale beyond small projects and teams.
We talked about the fact that generated APIs on top of databases stands in the way of properly designing APIs. We tackled the importance of API design and how APIs could be almost useless if you skip the design phase.
We looked into how exposing your database through an API violates the principle of information hiding.
It's clear to us that not implementing business logic in the API layer create another set of problems.
As an escape route, we looked into using GraphQL as an ORM but realized that it's not designed for this use case.
Finally, we discussed the risk of tight coupling, especially when relying on proprietary features of a SaaS provider.
Good use cases
What's left to say is that the idea of having a GraphQL database is not bad at all. The idea of wrapping database system with a GraphQL layer can also have value.
It's about how vendors market these solutions and how they educate their users to use their tools.
You could definitely use these tools to talk to your database using GraphQL. I'd personally prefer to use SQL or an ORM but that's personal preference.
A company like Hasura could provide GraphQL interfaces on top of any possible database. This could be an excellent niche tool to make it easier to build ETL pipelines. The problem with this is that it's a not so cool story to tell the VC's.
It's simply cooler to tell people to get "Instant GraphQL with authorization".
Ways to go from here and practical advice
Now that this rant comes to an end, I'd also like to give some advice on what could be a good direction to go.
First, as a potential customer of generated API solutions, I hope you are now aware of the trade-offs you're making.
Second, a word to all the companies working on solutions that generate APIs.
I really appreciate all the work you're doing. We all know, building APIs is hard, making this easier is our common goal. Generated APIs are not the solution, I think. Consider pivoting into other directions, like e.g. what Graphcool/Prisma did. We need more tools that make it easier to design, implement, iterate and publish APIs.