Introducing GraphQL to Microservice Architecture: A Case for Splitting Queries and Mutations into Separate Services
Abstract
GraphQL offers a very compelling solution to a common microservice problem. By implementing an API-Gateway using GraphQL, we can significantly improve the transactions between the consumer and API as well as ease the growing pains as the API and consumer requirements evolve over time.
However implementing a GraphQL gateway presents another problem. GraphQL’s query and mutation implementation are both very interesting and very powerful, but seem to be two separate ideas combined under one specification with very different requirements. By splitting them into 2 separate services. We see improvements to ease of use, documentation, manageable code, scaling, resiliency and authorization.
Introduction
Microservice architecture solves many problems while introducing a few. One of the most external problems that my team and I faced was the lack of abstraction of routing, fulfilling UI requirements and providing clear documentation for our API consumers.
This is a common pitfall with microservices. If we are trying to create a distributed system and avoid a monolith system or a single point of failure, at which point should we abstract the complexity of the system without exposing it to our consumers.
We want to avoid the above example. Not only are we forcing the client to know which prefix to hit to access certain endpoints, We are also leaking a significant amount of our internal infrastructure. On top of that, we are requiring the client to hit multiple endpoints to populate a single view.
This is better expressed in this article which introduces the concept of API Gateways and Backends for Frontend.
API Gateways
I first learned about this concept when learning about how Netflix architects their services, and how they manage the level of reliability and composability for hundreds of devices all with different requirements. They manage this with a complex microservice backend with a strictly defined internal API. Then individual teams design their own dedicated backend, each fulfilling their own requirements for whatever platform they are building for.
I was interested in introducing a GraphQL API-Gateway. GraphQL provides a very interesting solution for a complex microservice backend, in that it creates a “Backend for Frontend” approach without requiring the Frontend team to develop their own personalized server. That is because, by design, GraphQL allows the client to determine what data they need and how they need it, as long as it coheres to a strictly defined schema. To me, this sounded like a simplified version of Netflix’s approach. A better tool to achieve similar results with much less overhead
GraphQL
Once we introduced a GraphQL API-Gateway into our architecture. The client was no longer responsible for knowing which prefix to hit to reach a certain microservice, in fact, the client no longer even needed to know that we had microservices. We stopped getting as many requests to update our API responses, since our API consumers could define their own responses. Updates and changes to our API would no longer require deprecation notices or breaking changes, and internal updates were less error prone because the coupling of defined types was drastically reduced and abstracted behind a clear GraphQL schema. Finally, the client could populate any view in the app with a single request.
Transferring GET requests to GraphQL was easy and elegant. Suddenly 20 microservices were exposed into one, easy to understand object.
Mutations
The next step was introducing mutations. Mutations represent the Create, Update and Delete actions. Instead of REST endpoints, they are represented in camelCase function names ex createPost(input: {text: “text”}).
Where the transferring of our queries vastly simplified our API, the transferring of mutations quickly began to bloat our API beyond readability. All mutations should have special inputs and deterministic outputs. But I quickly found that our normal query outputs were unusable as mutation outputs since it allowed the consumers to use unnecessary resources calling sub-resolvers — sub queries that aggregate data— on a mutation response. For example, there is no reason to retrieve all unaffected comments for a post when updating the content of the post.
Eventually we began to see our schema bloat. We would have our entry-point object, a Bulk version — which omits large text bodies for requesting a list —and a mutation input and output. Suddenly our API which was supposed to simplify things was causing us to manage 3 different schemas of the same type. Making our auto-generated documentation confusing as well as making our own development error prone. Internally we would have separate files for mutation handlers and query handlers, mutation models and query models, etc. As we began to add more mutations, I quickly began to see a natural split in the growing monolith.
Splitting the Monolith
Once we split the mutations into their own service, everything became very clean and clear. The split was obvious for our consumers as well, as they don’t have to worry about storage or our service infrastructure, they only need to concern themselves with what action they want to take. Now we had one set of outputs and inputs for each object and the code was much easier to manage. The documentation was also improved. Instead of a massive list of query types next to a massive list of mutation types. Suddenly the consumer can get to the documentation they want quicker and easier depending on if they are looking for a query type or mutation name. Each gateway was built specifically for their differing requirements.
Authorization
The authorization requirements for queries and mutations are generally quite different. By nature, any GraphQL query originates from the root resolver or entry point object. That same request cascades down the subresolvers. A lot of the authorization and validation occurs naturally by referencing the parent object. For example, if the query hits the user root resolver.
query user {
email
posts {
body
}
}
We authorize at the first stage when retrieving the user. When the ‘posts’ subresolver is called, we already know the identity of the parent user object, so authorization of providing the user’s posts is already taken care of. This ended up vastly simplifying most of the authorization middleware we had for our REST APIs.
Meanwhile mutations are a bit different. If someone hits the API with
mutation createPost(input:{…}) {
body
}
Then we must retrieve their sessionId from the cookie, retrieve it from our redis store and authorize their ability to create content as well as auto populate fields from their session (such as userId etc). This interaction was very similar to the requirements of our normal REST API.
The different requirements for authorization make things complex when our middleware had to contain exceptions or different functionality depending on the request payload of the user (indeed with GraphQL, the only way to determine if the query is a mutation is looking at the payload body). Splitting into two services allowed us to remove exceptions for our authorization middleware and tightened up our system in general. Now we could develop different authorization middleware depending on the differing requirements of the services, and avoid error prone exceptions.
Scaling and Resiliency
Another benefit to separating the queries and mutations is that the resource requirements for these actions are generally very different. Queries focus on aggregating data and are hit constantly. While mutations are hit less often but are more likely to cause errors since they are mutating data. Since the heavy lifting of modifying data was handled on our service layer, mutations required a much lower resource allocation. Decoupling mutations with queries allow us to scale both services depending on necessity. But even more importantly, it allows the API that populates our app, allows users to navigate or even contact support to be much more failure resistant. If the Mutation-API has a critical bug, the user will still be able to navigate the app as normal and even reach our Help Center.
GraphQL has redefined the way we are able to think of API design and introduced a clearer transaction between the API and consumer in a way that allows each to develop and evolve independently from each other. GraphQL’s query and mutation implementation are both very compelling but seem different enough to necessitate a split. By splitting these two ideas, we have developed a very clean, resilient, and easy to use API with clear documentation and types and a complete separation of concerns.
Comments, Concerns or Objections. Please share in the comments