Apps As Clusters

Scaling A Nodejs App

Karthik Kottugumada
6 min readSep 17, 2020

Scaling any app cannot be a late addition to your application development timeline. It has to be planned and introduced right when you are designing it. Most teams, like one of the teams that I had been once part of, went though this whole exercise few development cycles after the kick off. And we ended up having to refactor a huge subset of our code. Dealing with scale issues on your distributed systems is not a lot of fun especially when the entire team is bursting at the seams with the amount of work pending before the beta release. Also there is no fixed set of rules for solving this. And hence there is a lot of material online on how it has to be done. So, there is no point in churning out one more post about the horizontal scale-out and vertical scale-up or detailing the three axis cube solution. Instead I will consolidate all the steps that we took and options we weighed in before making that decision and all the challenges we encountered while doing so.

Teams have proved time and again how cloning a large application and logically dividing the operations load greatly helps in scaling out, and with minimal effort. One common way to increase the throughput of the application suite is to spawn one process for each core of the hosted machine. By doing so, the already efficient “concurrency” management of requests in Node.js (see. “event driven, non-blocking I/O”) can be multiplied and also parallelized.
There are different strategies for scaling on a single machine. But the common concept is to have multiple processes running on the one same port, with some sort of internal load balancing used to distribute the incoming connections across all the processes/cores.

The strategies described below are for the standard Node.js cluster mode and the automatic, higher-level PM2 cluster functionality.

Native Cluster Mode
By default it works like this: the master process listens on a port, accepts new connections and distributes them across the workers in a round-robin fashion, with some built-in logic to avoid overloading a worker process.
Cluster mode implementation:
On your project entry point, require cluster mode, and spawn your application when the process is a worker. First, we require the cluster module.

Then we check if the current process is master. If so, we get the total amount of CPU cores and spawn worker processes. Otherwise, the current process is a worker, so we initialize our app at this place. Also, we are setting up a subscriber for the event exit. If a worker dies for any reason, we spawn a new one in replacement.

Round Robin Leader Election

It is probably not a good idea to spawn processes that are more in number when compared to the number of cores, because at the lower level the OS will likely balance the CPU time between those processes.

PM2 Cluster Mode
This lets you scale the process across all the available cores without the need for the cluster module. The PM2 daemon covers the role of the master process, and it spawns processes of your application as workers, with load-balancing, in a round-robin fashion. So, this lets you write your application as you would for a single core process usage and PM2 takes care of scaling it out to multiple cores.
As a process manager PM2 also takes care of restarting a process after it crashes. And once the application is started in the cluster mode the number of instances can be adjusted using pm2 scale

Use a better task scheduler
One very common and often ignored place is the task scheduler. Most microservice suites have task schedulers. They are a quick and easy way to automate the execution of a bunch of operations. It is important that we don’t rely on setTimeoutand/or setInterval. Poorly written schedulers might be a pain to scale out horizontally, resulting in issues with duplication of cron jobs.
1. We switched to using Azure scheduler, which has greatly evolved ever since it was initially introduced.
2. Another approach that we tried was using cosmos and/or mongoDB to store recurring jobs. And use that as a high level queuing system to store and lock the execution every time a worker starts a job.
3. There is also an npm package available for the cron module(with a seemingly obvious dependency on moment). It lets you define cron jobs inside your node.js code, making it independent from the OS configuration. The worker process can create the cron, which invokes a function putting a new job on the queue periodically. Using the queue makes it more clean and helps you take advantage of features like priority, retries, etc. make the implementation more efficient.
Everything seems rosy till you have only one worker process. Which most likely might never be a production worthy case. Now because the cron function would wake app for every process at the same time, putting it on a queue creates duplicates of the same job that would be executed many times.
The best way to solve this problem is to identify a single worker process that will perform the cron operations.
We can make use of “leader election” offered by something like hapi cron cluster. Contrary to what you might assume, Leader Election is a much simpler process in the software development world. If you are using hapi cron cluster, you start by providing a redis client to the cron-cluster module, which would be used by the worker processes to communicate among themselves and agree who will be the “leader”.
One challenge that you might face while running a cluster might be addressing cases where we need stateful communication. Since communication is not guaranteed to be always with the same worker process. A very basic example of such a scenario might be issues with user authentication. With a cluster, the request to authenticate comes to the master and via the loadbalancer moves to a worker process, which do not know whether the request is authenticated or not.
There are numerous ways to resolve this. One quick and simple approach could be to share the state across all the works by storing this session information on a shared database. But like in our case we couldn’t afford touching the code in so many places, without massively breaking things. A minimal invasive approach might be using something known as Sticky Load Balancing. Most modern load balancing solutions have this feature out of the box. At a high level, when there is a need to share data between the processes, a record of the relation is stored at the loadbalancer level. So, when the same user sends a new request to authenticate, we do a lookup in the existing records to find out which server has a their session authenticated. And keep sending the requests to that server instead of the regular distribution. This ensures that we do not need to change the existing code. But doing this means that we are not fully load balancing our calls. In this example, calls for authenticated users.

Caching
Caching goes a long way in helping us scale the apis better. If we are trying to cache with the previously discussed cluster setup, we would need to implement an exclusive entity object. And use that to record incoming and outgoing calls to that entity’s API. This could be a collection on mongoDB, any key value db or even a column based db. Contrary to what it might seem like, maintaining a separate entity for our caching needs makes decomposing the app easier.

Caching the API calls
Choosing an application level implementation of cache invalidation instead of a fixed TTL, might be more efficient.

Services on AWS
There are also some Java services that provide infrastructure support for the Node.js services:

Netflix Eureka is a service registry and discovery tool that enables services to find other services without needing to know where they’re running.

Netflix Zuul is a load balancer and dynamic routing service that gets a list of available instances of each service from Eureka, routes requests between services, and balances the load among server instances.

Resources:
I have found a huge treasure trove of information regarding the best practices, but very few instances talking about the practical challenges. In my study regarding the various dimensions of scalability, I found these really useful

  1. Distributed Systems for Fun and Profit, Mikito Takada
  2. A brief introduction to distributed systems, Maarten van Steen & Andrew S. Tanenbaum
  3. Designing Data-Intensive Applications by Martin Kleppmann

--

--

Karthik Kottugumada

C̶o̶g̶i̶t̶o̶, Somnium ergo sum (no! what you are thinking of is “lorem ipisum”. This is different) https://www.linkedin.com/in/kottugumadakarthik