Infrastructure (queues) for a dev
As an engineer on a very small team I get to see the consequences of architectural decisions that I have influenced. I also have some ownership over the side effects my code has on infrastructure. Over the last year I have been thinking about these areas where code meets metal a lot so I want to share some of my thoughts. Message Queues are just the flavor of the day. Hopefully this helps with thinking about a system as a distributed unit, and keeping it all in sync.
In the landscape of asynchronous software and non-realtime systems (not airplanes) it can be really helpful to throttle pressure on backend systems and to make events durable until they are consumed. In other cases many services will want to hear from a single service about a particular event. Queues can do all of these things. Queues are awesome.
Imagine a website that likes to store the IP of every browser that opens a url to one the pages in its domain. Suppose it has really bursty traffic, where tens of thousands of people may visit it for just a few minutes every day. For fun let's also say we have a backend that some engineer decided would literally carve the data into granite (for the record, this would be slow). We can't scale this granite carving system. So what do we do? Throw in a queue.
With most modern queueing tech and hardware, writing messages is cheap and fast. So if we say this website is dynamically rendered and has scaled to handle all of the random small web traffic it has to serve to stay up, then it can afford to asynchronously (without blocking web responses) write, or publish, to a Queue without too much additional cost proportionally. Tens of thousands of messages are usually a minor inconvenience in terms of what can be stored in a queue, and the granite carver can take them out in order one by one as he finds time to get to them. Some queuing systems will even optionally hang on to the message until the carver reports that he is finished.
Problem solved, no components get very stressed out, the response time is nearly unaffected, and we can scale the queue cluster if we need more capacity. If the queues start to get full then we can start building other systems to drain them for more persistent collections, or rethink carving granite as a backend strategy.
So this is a case for queues where the incoming data exceeds our ability to write or process it, but we want to keep up in terms of servicing requests. We could technically persist all of the requests to a different database up front, and then poll for un-carved data, but that's much less straightforward to think about and polling is something to avoid when possible to keep things efficient. Further that database may not necessarily be designed to handle bursts of writing.
Note that there are serious downsides that come with asynchronously kicking the can like this - you introduce extra network hops, have to deal cluster network partitions (see CAP), and receiving systems have a hard time participating in responding to the original request without you implementing a RPC synchronous interface on top of queues. Here is a RabbitMQ tutorial for an example of how this might be done.
Because of the asynchronous nature of systems with lots of systems downstream from queues, these systems are often called eventually consistent because they are not consistent in a known and brief period of time, but will be consistent soon. In the case of the granite carver, the granite data store is eventually consistent because messages aren't written immediately and the time varies depending how many get queued up. If we needed to immediately read the data after writing it, then introducing heavy queuing may not be the best idea.
Another common usage of queues is for side effects that don't need to happen immediately. For example if a user is changing email settings. You would apply the write to change the settings immediately and reply success, while asynchronously passing a message via queue out to other systems which will apply the side effects of the settings change in their own sweet time.
An added plus in that scenario, is that the queues will often guarantee at least once delivery, which means that if a fatal error happens in the consuming system while processing the side effect then the system will get the message again when it recovers. This can dangerously ensure that your service stays down if that message triggers a bug that takes the system down every time it recovers. We can work around that downside in particular by expiring messages after they hit a certain age or deferring them into an overflow queue for handling on a case by case basis, if it becomes an issue. This is still cleaner than the message being lost to the nether once the error is first encountered if you care about the side effects being applied.
A key phrase just came up, at least once delivery, which while making one promise very clearly says it may deliver messages multiple times. Most queues suffer from this because when they doubt delivery due to bad network conditions they have to redeliver. Some queues manage to avoid it to a degree with cache layers on the clients, but all of them document how they provide at least once delivery if they do provide it. The way to deal with multiple delivery is to build systems which can receive the same message over and over without anything bad happening, such as sending the same email twice. Engineers love to call these systems idempotent.
There's still more to the at least once deliver story. Some systems will deliver messages out of order, which can really bite. Watch for when this happens in the documentation for a queue technology and either build your system to not care when things arrive, or build it to the strength of the queue tech you choose so messages related to each other will always be in order. Systems can also be built to watch which intended order a message has; for example if you see message #2 it would be implied there was a message #1 even if it has not showed up yet.
These down sides may sound tricky, but in practice they are very painful to work around. It is surprising sometimes how many things are fine being eventually consistent.Using queues helps keep your environment healthy and well balanced when used well, and are critical when building microservices.
Let's dive into another scenario involving an HTTP Rest Api that is hammered with requests to create cat recognition models that are expensive for us to compute. We have a choice to queue these creation requests after tagging them with an id, and simply return the id right away. Keep in mind the consumer may turn around and ask to use their cat model before it is created then, and we would need to respond that it is still in progress . That's more REST design than anything, but see how queues help us seem nice and snappy here?
They also help us distribute the labor. Our cat recognition models are premium, but what if our machines can only process one model at a time? We can configure our events to be delivered evenly across the workers and in a one at a time fashion.
If a third logging and telemetry system also wanted to hear about these messages, they could subscribe to the published create-a-cat-model events. Configuring an event to be delivered to multiple types of clients can be done with routing or what is called fanout configuration of some form.
So what are some queueing technologies? There are a great many. A popular and mature one I like is RabbitMQ, but NATS is a newer supposedly faster kid on the street which has a cool graph here which also shows lots of other queue tech. Azure and AWS also provide dedicated queuing technologies which can be very robust.
Queues can make a tremendous difference where it counts and should not be underestimated as a simple solution to what seems like an overwhelming problem. Notheless be careful when brandishing asynchronous technology. Race conditions will eventually show up to eat your lunch, if they do try looking at something like hash based routing to help avoid distributed locks which can be difficult to keep honest.