250bpm

You cannot have at-least-once broadcast

I've written down this argument several times before but I think repeating it over and over until people internalise it is worth it.

On this particular ocassion I was inspired by Tyler Treat's "You cannot have exactly-once delivery" which isn't very formal but boils down to "you can't have exactly-once delivery and side effects at the same time". It's something that would be really nice to prove formally at some point.

Anyway, my "proof" tackles a bit different problem: Ability to do at-least-once broadcast, whether with side effects or without them. It goes like this.

Imagine the simplest possible case: Alice is sending messages to Bob and Carol. As long as there is no partition both Bob and Carol receive exactly the same stream of messages.

So far so good, but what happens if Carol loses connectivity?

One possibility is that Alice stops sending messages altogether until the connectivity is reestablished. In such case Bob is not getting any messages because Carol (who he may not even know — she's a friend of Alice, but wasn't introduced to Bob) is offline.

It's not clear whether that's a good thing or a bad thing, however, if you replace Bob and Carol by 100,000 customers the issue becomes obvious: It's not acceptable to stop serving 99,999 customers just because one of them is offline. Actually, if you have 100,000 customers at least one is guaranteed to be disconnected at any given time. Which means no messages are delivered, ever.

Ruling that option out, there's an alternative solution: Send the message to Bob even if Carol is offline and store the message to Carol for later delivery.

The part that most people don't take into consideration is that storage is limited. Hard disks may be cheap nowadays but if Carol is offline indefinitely you will eventually run out of disk space. And with 100,000 customers you will run out of disk space much faster.

The only thing you can do at the point where you run out of the storage space — if you don't want to lose messages — is to block sending to *all* recepients. Which means that we are back to the previous scenario where Bob is not getting messages because Carol is offline.

You can try to solve the problem by assuming that the storage is unlimited. However, if you really want to be honest with yourself, what it really means is: "We have a 24/7 team of ops people who start paniciking when we are getting close to the disk space limit. Eventually, they will send an intern to the store next door to buy an additional hard disk."

And when you put it that way, it's clear that the argument is not valid. If it was you could have a CAP database: Just have some people to watch it and fix the problems as they happen.

As a conclusion, you cannot have these three things at once: Broadcast and at-least-once delivery and partitions.

And now it's time to play the trick that CAP theorem once did: Let's take the problem understood to be unsolvable by experts but generally expected to be solved by the software infrastrucure. Let's devise a explicit theorem (BAP theorem in our case) that proves that problem can't be solved. Finally, let's ridicule everyone that assumes the opposite until the theorem becomes general knowledge.

September 6th, 2015