Most of the time when I encounter fascinating situations in the real world, I am struck by how well that situation might translate into a novel. For example, I constantly see situations and people that inspire character traits or plot points that someday might make their way into one of my books. However, every once in a while, I see a situation in real life that tickles my technical fancy.

There is a “deli” (the quotes are because in New York, “deli” doesn’t mean what most people think it means.. a NY deli is often a giant cafeteria-style buffet place that serves 20 kinds of food, and sells everything from chips to bobble-heads and gummy bears) near my office in Manhattan that is a marvel of modern retail efficiency. On its busiest day, you can make your way through the throng of people, order what you want, and get out in less than 20 minutes. The really remarkable thing is that the checkout line is unbelievably fast and I can be six people back and still get through the line and out the front door in less than 3 minutes.

What makes this an interesting technical blog post is that this place employs a number of techniques that people like me have been using as large-scale application development design patterns for years. My coworkers and I often refer to the lines of people feeding up toward the cash registers as non-blocking, asynchronous processing.

You really need to see these people in action to believe it. There’s one person who is handling nothing but the credit card swiping, another person is handling the cash register, yet another takes your food and puts it in a bag for you, a bag which comes pre-loaded with a fork, a knife, and napkins. There is absolutely no wasted time from the perspective of the customer. All of the things that a customer might be blocked on in a traditional purchase queue have been parallelized. 

There isn’t a per-customer wait for the bag to get the common essentials stuffed into it. When the card-swiper person is swiping your credit card, the cash register person is actually ringing up the order of the person behind you in the queue, and the bag-stuffer is working on the bag of the person behind that.

This is all well and good and getting your food fast is always a good thing, but what does it have to do with building scalable systems? Everything.

Imagine that a web request is a customer waiting in line for food. If, upon this request’s arrival at the web server, the web server has to go and do everything in serialized order, for every single request, then the time it takes to handle a single request may not appear to be all that bad, but the more requests you have, the longer the lines get. The more back-up you have building up in a queue, the worse the perceived performance of your site will be, even if the time it takes to handle a single request is fixed and relatively short. Why is that? Because each person’s request isn’t being handled immediately upon arrival at the site, their requests are piling up in a queue waiting for other requests to be processed.

Typical scaling patterns just increase the number of concurrent threads to deal with incoming requests. That’s fine, and it works for a little while, until you run out of threads. Now, let’s say you have 30 threads. The first 30 people to hit your site have a decent experience and then the 31st and thereafter experience the same delays as before. People see this and the initial knee-jerk reaction is to add more servers. So now let’s say you’ve got 4 servers, each with 30 threads (I’m simplifying to make the numbers easier to picture). The first 120 people to hit your site now have a decent experience and then thereafter, additional customers are subject to delays and waits.

This is a classic fix the symptom not the problem approach. If we apply this to retail then I’m sure you know what the knee-jerk retail reaction to the high load and peak volume problems are – add more cashiers. So now instead of four queues handling people’s orders, you have 8. Great, but you run into the same problems as above, plus an even worse one – your store runs out of room to accommodate the people waiting in the queues. In some circles, we also refer to this problem as the latency versus throughput problem. It’s actually more involved than that. If you optimize for latency, then you train your cashiers to be unbelievably fast at processing a single order. This appears to be good because this (ideally) means your queue drains faster, with customers moving through the line faster. To increase your throughput, you just add more highly trained, super-fast cashiers. Simple, right? NOPE.

What has this one deli done that developers building systems for scale failed to do? It’s remarkably simple, when you think about it. Rather than taking a monolithic process and scaling that one process out (completing the entire checkout process from start to finish), they have deconstructed all of the tasks that need to be done in order to get a customer through the line and have applied enterprise software patterns to dealing with those tasks. First, they have done some tasks before the customers even arrive, such as pre-filling the bags with forks, knives, and napkins. This removes several seconds from the per-customer pipeline which, when you multiply that out by the number of customers in this place (it’s huge, trust me) and the number of registers, makes a significant impact.

The next thing they’ve done is identify tasks that can be done in parallel. The same cashier doesn’t need to ring up your order and swipe your credit card. These can be done by two different people. This is where throughput versus latency becomes really important. The per-customer time to finish remains the same because the customer can’t leave until both tasks are complete (ring-up and swipe), however, you’re dramatically increasing your throughput because while the first customer is having their card swiped, the next customer is having their order rung up, which allows the pipeline to absorb more customers.

So what’s the moral of this long-winded story? Simple: Model your request processing pipelines like a super-efficient Manhattan deli 🙂