[RIP‐70] Optimizing Lock Mechanisms

Status

Current State: Draft
Authors: HuaiYuan Wang
Shepherds: RongTong Jin
Mailing List discussion: [email protected]
Pull Request: https://github.com/apache/rocketmq/pull/8663
Released: no

Background & Motivation

What do we need to do

Design flexible lock optimization strategies and optimize message sending and receiving logic to improve message sending and processing performance.

Why should we do that

As concurrent systems grow more complex internally, deploying effective lock management strategies is key to preserving performance. The adoption of locks in the concurrent code of RocketMQ may have room for optimization. For instance, the current usage of locks, while critical for ensuring consistency and preventing race conditions, could potentially be refined to improve overall message throughput without significantly impacting performance. In practice, we have demonstrated that adjusting the lock strategy can impact the message-sending performance of RocketMQ. Merely altering the backoff strategy of SpinLock can result in a performance difference of 20% (or even more) between the best and worst cases.
随着并发系统内部变得越来越复杂，部署有效的锁管理策略是保持性能的关键。RocketMQ 并发代码中锁的采用可能还有优化空间。例如，当前锁的使用虽然对于确保一致性和防止竞争条件至关重要，但可以进行改进以提高整体消息吞吐量，而不会显着影响性能。在实践中，我们已经证明调整锁策略会影响 RocketMQ 的消息发送性能。仅仅改变 SpinLock 的退避策略就会导致最佳情况和最坏情况之间的性能差异达到 20%（甚至更多）。

Goals

What problem is this proposal designed to solve?
1. Design locking strategies to cope with different concurrent pressures
2. Optimize the message sending and receiving logic
3. Design an adaptive locking mechanism
- 一.设计应对不同并发压力下的锁定策略
- 二.优化消息收发逻辑
- 三.设计自适应锁定机制
To what degree should we solve the problem?
1. In different lock competition situations, adaptive lock selects the appropriate locking mechanism according to the critical size and competition situation.
2. Design the conversion structure between different locking mechanisms to avoid deadlock or lock failure caused by the switching of locking mechanisms.
3. Optimize the message sending and receiving logic
- 一.在不同的锁竞争情况，自适应锁根据临界大小以及竞争情况选取合适的锁定机制.
- 二.设计不同锁定机制之间相互转换的结构，避免造成切换锁定机制过程造成死锁或锁定失效.
- 三.优化消息收发逻辑

Non-Goals

Are there any limits of this proposal?
1. The calculation process of the adaptive mechanism may be affected by hardware fluctuations
2. In special scenarios, the optimization results may not be ideal. Therefore, you need to provide supporting tools to update related configurations
- 一.自适应机制的计算过程可能受硬件波动影响
- 二.特殊场景下优化结果可能并不理想，需提供配套工具对相关配置更新等

Changes

Architecture

Since each locking mechanism cannot take into account various competitive scenarios even if it dynamically ADAPTS, the design of the adaptive lock realized by integrating various locking mechanisms will adjust to a suitable locking mechanism when the dynamic adjustment of a locking mechanism reaches the limit and cannot adapt to a certain scenario。
由于每种锁定机制即使动态适应也不能兼顾各种竞争场景，因此设计综合各种锁定机制所实现的自适应锁，当一种锁定机制动态调整达到极限也无法适应某种场景时，自适应锁便会调整为一种适合的锁定机制

As can be seen from the figure, when each thread enters the critical section, the average critical size and lock contention degree are calculated.
When the current locking mechanism is judged to be unsuitable, the command to change the locking mechanism is issued, and the status is set to 0 to prevent the thread from obtaining the lock.
The lock status is synchronized, and after resetting the calculation, the status is set to 1 to restore the normal state

一.从图中可以看出，当每个线程进入临界区后都会计算平均临界大小和锁竞争程度
二.当判定当前锁定机制不适合后，发出更改锁定机制命令，并将status设置为0，阻止线程获取锁
三.进行锁定状态同步，重置计算后将status设置为1，恢复正常状态

初步验证

At present, the locking mechanism of spin optimal K-order retreat strategy is preliminarily verified
Single-machine four-process stress test (message body size 2B):
目前初步验证自旋最优K次退避策略的锁定机制
单机四进程压力测试(消息体大小2B):

CPU Arch	Flush Policy	Original QPS	k	Optimal QPS	Improvement
X86	ASYNC	176312.35	10^3	184214.98	+4.47%
X86	SYNC	177403.12	10^3	187215.47	+5.56%
ARM	ASYNC	185321.49	10^3	206431.82	+11.44%
ARM	SYNC	188312.17	10^3	212314.43	+12.85%

As can be seen from the above figure, QPS changes, and RocketMQ's different locking mechanisms are used without selection guidance, and the performance gap between different locking mechanisms can not be small, so we will also unify the locking mechanism to adaptive locking.
由上图可看到QPS的变化，并且RocketMQ的不同锁定机制使用之间无选用指导，同时不同锁定机制之间的性能差距不容小嘘，因此我们同时将统一锁定机制为自适应锁.

Implementation Outline

GSOC-2024：Optimize Lock Mechanisms结项报告： https://shimo.im/docs/e1AzdMJYBjS6xmqW/
We will implement the proposed changes by 3 phases.

Phase 1

Optimize the locking logic for message delivery to commitLog
Spin optimal degree K retreat locking mechanism is introduced
Optimize the back pressure mechanism of the client

优化消息投递到commitLog的锁定逻辑
初步引入自旋最优次数K退避锁定机制
优化客户端背压机制
https://shimo.im/docs/0l3NMnen8GFGDaAR/

Phase 2

Adaptive lock is implemented initially
Introduce other locking mechanisms
Optimize message receiving logic

初步实现自适应锁，优化退避策略机制
引入其他锁定机制
优化消息接收逻辑
https://shimo.im/docs/XKq42a0gdJsJ4ANG/

Phase 3

Improved adaptive locks
Provides tools for adjusting the lock mechanism
Design the necessary tests

完善自适应锁
提供锁定机制调整配套工具
设计必要的测试

Rejected Alternatives

How does alternatives solve the issue you proposed?
- None
Pros and Cons of alternatives?
- None
Why should we reject above alternatives?
- None

Home
RocketMQ Improvement Proposal
- RIP
User Guide
- FAQ
Community
- Release Policy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RIP‐70] Optimizing Lock Mechanisms

Status

Background & Motivation

What do we need to do

Why should we do that

Goals

Non-Goals

Changes

Architecture

初步验证

Implementation Outline

Phase 1

Phase 2

Phase 3

Rejected Alternatives

Clone this wiki locally