Navigating the GraphQL Evolution at Toast: From BFFs to Federation

Introduction
At Toast, we believe that GraphQL is the right technology to build efficient web and mobile applications. This did not happen overnight. In this blogpost, we will cover the adoption of GraphQL at Toast, from its early days to the recent paradigm shift towards GraphQL Federation.
Early Days of Toast
Toast was founded in 2012. Alongside our Android POS, there was a monolithic web application accompanied by a monolithic database. That architecture was great for rapid iteration during the early days, but after a few years we had to start advancing the architecture to support our increasing customer and organization scale. To address our scaling teams, we pivoted to microservices, allowing domain teams to move fast by owning their services, deploying them independently and scaling them appropriately.
Adoption of BFFs to move out of the monolithic web application
With Toast now having over 100 feature teams, the monolithic web application became a bottleneck as every change needed a coordinated deployment. This led to a huge loss of efficiency and reliability, as the monolithic web application also served as the backend for our Android POS. Also, web pages were all server side rendered using the Play framework.
Because of this, we started to embrace a micro-frontend architecture. In this micro-frontend architecture approach, over 100 feature teams at Toast gained ownership of their Single Page Apps (SPAs) written in React, allowing for independent development, deployment, and elevation in their respective repositories. This micro-frontend architecture not only facilitated a seamless and consistent web experience but also integrated GraphQL as a crucial tool for web frontend teams. The adoption of a Backend-for-Frontend (BFF) API gateways allowed a way to have flexible presentation business logic and allowed to fetch extra information that was needed for presentation. Each SPA would have its own BFF with that logic.
There were a couple of challenges with this approach. Now each frontend team needed to replicate the REST API domain models and create a GraphQL schema counterpart. It became a common practice to create REST APIs solely to serve the GraphQL schema, which then introduced unnecessary complexity and maintenance. And in some cases, teams just opted to bypass BFFs and use the REST API direction which then lost the advantages of GraphQL.

Federation - The Technical Shift
As we scaled that approach, the inefficiency scaled as well. Teams also had an extra service to maintain and some of the BFFs would not evolve with changes of the domain services. This is when we started to investigate “Apollo Federation”. Departing from the traditional BFF pattern with custom resolvers, GraphQL Federation involved each domain team contributing its schema (subgraph) to a larger schema (supergraph). This allowed simplified data querying for frontend developers while ensuring consistency across user experiences.

Schema Registration
A pivotal part of the workflow is having a solid schema registry. A schema registry keeps track of all the different domain subgraphs which allows developers to explore and understand how things are structured in various domains. Also, each subgraph can contribute to existing entities, so it looks like a unified schema to the client, but its data is served by different domain services.
Below example shows that the partners domain added “connectedPartners” to the “RestaurantLocation” type.

https://the-guild.dev/graphql/hive
But the registry does more than just organize information. It acts as a guard, making sure that when different parts of these domains come together, they fit seamlessly, like pieces of a puzzle. It's like having a quality control system that checks that everything works smoothly when you're combining different parts of the puzzle.
And here's the really cool part: the registry can also spot potential issues before they cause trouble. It can predict if changing something in one part of the graph might create problems for consumers – a kind of early warning system that helps keep everything running smoothly for users. So, in a nutshell, the schema registry is the behind-the-scenes system that helps developers understand, combine, and ensure the smooth functioning between the frontend and the backend.

https://the-guild.dev/graphql/hive
Challenges (Technical Realities)
Transitioning to GraphQL Federation presents its own set of challenges that require careful consideration. One major obstacle involves ensuring a clear separation between different areas of our system and establishing standardized types for consistent communication. This is crucial for making GraphQL Federation work effectively and brings a practical perspective to the forefront. Another challenge we face is figuring out how to manage filters that cross different areas of our system, particularly when the filters don't neatly align with the data they are trying to filter.
GraphQL's structure, with queries tucked away in the body, makes traditional rate-limiting methodologies, based on path specifications, less straightforward. In REST, it's a common standard to discern different intents by examining the path. However, with GraphQL, the entire query resides in the body, creating a complex scenario. Reading the body for authorization at the edge introduces both speed and security challenges, as the entire body needs to be stored in memory. As we chart our course through GraphQL, finding a delicate balance between rapid access and robust security protocols remains a key challenge.
One of the ways we are addressing this is using GraphQL Persistent Queries. This involves saving commonly used queries on the server side. Instead of sending the entire query every time, the client just sends a short identifier. The server then finds the saved query and runs it. This saves time and makes things more secure because only approved queries can be run. It also helps with storing data in a way that makes it quicker to access. Overall, it's a way to make using GraphQL faster and safer for everyone involved.
Impact and Acceleration
At the time of writing, around 60 subgraphs have contributed to the larger subgraph of our restaurant admin portal. This has allowed teams to reuse established schemas and extend types with their domains.
One of my favorite quotes from a team was: "We canceled 3 tickets in the last week because we realized the GraphQL Supergraph handled it automatically! We're able to go SO MUCH FASTER"
We also extended federation to our Guest facing applications, like Local by Toast and online ordering. For example, using the federated gateway, Local by Toast was able to leverage split payments when paying for checks on premise because it had already been implemented in a different domain service.
Next Steps: Maturity and Scalability
As the federation approach takes root, the next steps involve maturing the implementation. Perfecting tools ensures seamless adoption across Toast, delivering value faster to customers while maintaining a scalable and maintainable architecture.
Moving to GraphQL federation brings exciting possibilities for improving how systems work together. However, one challenge is making sure each part of the system knows what it's responsible for. This means carefully planning what each service does and how they connect. Proper domain design is crucial for defining clear service boundaries, minimizing dependencies, and ensuring cohesive data ownership.
Using GraphQL Federation as our frontend API gateway, we were able to bring consistency within all our platforms, including our native mobile application ToastNow.

Conclusion - Towards Technical Excellence
As we navigate the complexities of GraphQL Federation, acknowledging and addressing these technical challenges is crucial. The Toast tech landscape is ever-evolving, and we're committed to technical excellence as we continue unraveling the intricacies of GraphQL at Toast. Stay tuned for more technical adventures in our pursuit of the perfect tech recipe!
_______________________________
1 Under-fetching is not having enough data with a call to an endpoint, forcing you to call a second endpoint.
____________________________
This content is for informational purposes only and not as a binding commitment. Please do not rely on this information in making any purchasing or investment decisions. The development, release and timing of any products, features or functionality remain at the sole discretion of Toast, and are subject to change. Toast assumes no obligation to update any forward-looking statements contained in this document as a result of new information, future events or otherwise. Because roadmap items can change at any time, make your purchasing decisions based on currently available goods, services, and technology. Toast does not warrant the accuracy or completeness of any information, text, graphics, links, or other items contained within this content. Toast does not guarantee you will achieve any specific results if you follow any advice herein. It may be advisable for you to consult with a professional such as a lawyer, accountant, or business advisor for advice specific to your situation.