The Scalable Thread

The Scalable Thread

Share this post

The Scalable Thread
The Scalable Thread
Why Distributed Transactions can be slow?
Copy link
Facebook
Email
Notes
More

Why Distributed Transactions can be slow?

Discussing the cause of Latency for Transactions in Distributed Systems

Sid's avatar
Sid
Jun 21, 2024
7

Share this post

The Scalable Thread
The Scalable Thread
Why Distributed Transactions can be slow?
Copy link
Facebook
Email
Notes
More
1
Share

There are certain times when a particular problem requires orchestrating updates to multiple different services/resources/shards together as an atomic unit of work. If any of the updates fail, the entire work unit needs to roll back because if partially executed, the state of the system as a whole can become inconsistent. This is the scenario where Distributed Transactions can prove to be one of the viable solutions. While in a monolithic or a non-sharded single resource, Transactions might be a great choice, they can cause some serious performance impact when executed within a distributed or a sharded system.

Behind the scenes: 2-Phase Commit

Well, it’s difficult to write about every protocol used by every system available on this planet. So, for the sake of simplicity, this text will focus on one of the most popular protocols used for distributed transactions — 2-Phase Commit protocol (or 2PC).

2PC as the name suggests makes a distributed transaction happen in two steps:

Step 1: Prepare

The orchestrator of the system sends a "prepare” message along with instructions in the given transaction to all the participants of the system. Each participant makes a log of these instructions and responds with an acknowledgement if it is ready to execute them.

Step 2: Commit

Once the orchestrator receives the acknowledgement from all the participants, it sends a “commit” message to all of them, requesting each of the participants to make a change in their state as per the instructions in the given transaction. However, if any of the participants don’t send an acknowledgement for the “prepare” phase, all of the remaining participants are told to roll back.

Besides this, the locks also are held on the system resources to ensure serializability.

Fig. 2PC— Successful commit
Fig. 2PC — Rollback (Unsuccessful commit)

Slow? Where?

By now, someone familiar with distributed systems will have a fair idea about the throttling areas that can slow down distributed transactions.

Resource Failure

In 2PC, the success of the transaction depends on whether all participants can acknowledge the prepare phase. If any participant/resource crashes or loses connectivity, the orchestrator waits until timeout which increases the latency and eventually triggers a rollback in which case the transaction needs to be restarted.

Network Latency

As multiple network round trips are required for each phase of 2PC including the retry attempts to establish a connection with unresponsive participants, any increment in network latency can cause a significant delay (or maybe timeout) in the transaction. In addition to this, an increase in the number of participants in the system will also impact the latency as the orchestrator is required to wait for all the participants to acknowledge the prepare phase.

Orchestrator Failure

In case the orchestrator fails before notifying the participants about the commit, participants may keep waiting (until timeout) for the orchestrator to recover. This can block the system from proceeding by halting the release of the locks.


Subscribe to The Scalable Thread

By Sid · Launched a year ago
One well-researched system design concept simplified like you're five, in just 5 minutes, every week! Written by a FAANG software engineer
Sid's avatar
Giang Dang's avatar
Göktürk Gök's avatar
Siddharth Upadhyay's avatar
Dan Andrei Ghinea's avatar
7 Likes∙
1 Restack
7

Share this post

The Scalable Thread
The Scalable Thread
Why Distributed Transactions can be slow?
Copy link
Facebook
Email
Notes
More
1
Share

Discussion about this post

User's avatar
What is Event Sourcing?
Understanding Event Sourcing, How it works, and How to Rebuild State
Feb 14 • 
Sid
52

Share this post

The Scalable Thread
The Scalable Thread
What is Event Sourcing?
Copy link
Facebook
Email
Notes
More
What is Saga Pattern in Distributed Systems?
Understanding How Microservices Coordinate Distributed Transactions
Feb 21 • 
Sid
54

Share this post

The Scalable Thread
The Scalable Thread
What is Saga Pattern in Distributed Systems?
Copy link
Facebook
Email
Notes
More
What is Service Discovery?
Understanding How Services are Discovered in Distributed Systems
Feb 7 • 
Sid
65

Share this post

The Scalable Thread
The Scalable Thread
What is Service Discovery?
Copy link
Facebook
Email
Notes
More

Ready for more?

© 2025 Sidharth
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.