Discussion about this post

User's avatar
Tim Marcus Moore's avatar

This is a great overview. Thanks for sharing your wisdom in such an easily-understood article.

I'd add one major category to this list: sequencing. The article talks about “services” placing items on a queue and being read by other “services”, but services generally consist of a set of nodes acting somewhat independently. If service B is consuming messages M1, M2, and M3 from a queue, and service B consists of three server nodes B1, B2, and B3, it’s possible that all three messages will be processed concurrently, with no guarantee that the effects of processing M1 will occur before M3 is processed on another node.

You can solve this by having only a single consumer, but that hits scaling limits. The usual solution is to partition the queue into sub-queues, each with its own ordering guarantee, and ensure that each partition is only delivered to a single consumer.

This is a good solution, but comes with its own challenges. For one thing, how many partitions do you use? There’s a balance: too few and it limits your scaling, too many and you waste resources on overhead. It’s can be hard to change this later: you can’t really change the number of partitions while the system is running and maintain your ordering guarantees, but most systems at the scale where this is important can’t stop running. The best solution I’ve found is to essentially create a new set of topics and consumers, and then transition the system from the old ones to the new ones. This is tricky in practice and of course quite costly to get right and even more costly to get wrong. I have to admit, though: it's been a few years since I've looked at this problem. Maybe newer message broker systems have better solutions.

This also relates to the issue of message loss and the dead-letter queue. If all of the messages are independent, a dead-letter queue can be a good approach, but what if the process for M3 assumes that you’ve already processed M1 and M2? For example, let's say M1 creates an order, M2 updates it, and M3 dispatches it. Best case scenario: you’ll also crash on M3 and deliver that to the dead-letter queue (such as if M1 wasn't processed, and there's no order to update or dispatch), but if you aren’t careful, it might just succeed but do the wrong thing (such as if M1 and M3 are processed, but not M2).

Then, there’s also the problem of ensuring that the messages are enqueued in the right order, which might also not be straightforward, depending on the design of the producing system.

I like event-driven architectures, and think they can have a lot of benefits that outweigh the cost of these challenges (in the right use cases). I just think it's important that people go into these system design decisions with full awareness of the tradeoffs they're signing up to.

Expand full comment
Ian Pak's avatar

Idempotency case, how we can make sure the insert is properly done or not? Even if it has a flag to mark the event processed, how does it guarantee the process is done properly?

Expand full comment
2 more comments...

No posts