High contention when using arena allocators #162

gootorov · 2024-02-20T14:51:21Z

Hi!

I've been testing recently added feature in #155

Overall, the performance improvement from that feature is great. However, it seems there's an issue with scaling. We're using the default CPU EP and we have more than 20 models (sessions) that are shared (Arc'ed) between all threads and on which we're calling run concurrently from all worker threads (note: each thread does not run an inference request on every model, but chooses a specific one depending on certain conditions).
As the number of threads increases, I see an increase in system (kernel) CPU load. At 88 threads, our system CPU load increased from <5% to 12-15%. strace showed that ~90% of kernel time is spent in futex syscalls. Take a look at what perf shows:

I'm assuming that if we had a single shared model, then the contention would be even higher.
There are essentially no other futex syscalls in the whole flamegraph (unfortunately, I cannot share raw .svg, sorry about that)

Then, I've stumbled upon the following documentation (the Share allocator(s) between sessions part)
https://onnxruntime.ai/docs/get-started/with-c.html#features
I've hypothesized that if there's a global session object, and many threads are calling run on it, then run could be getting stuck on some kind of arena mutex. I then tried changing the application to have a session(s) per worker thread, instead of shared ones. If sessions have their own local arena, I expected to see an increased memory usage, but reduced contention.
Unfortunately, pretty much nothing changed, and the before/after flamegraphs look more or less identical.

So, I'm not familiar with ONNX's internals, but could it be that the arena allocator is shared between all sessions by default? Do you think it makes sense to make that configurable? Is it an arena mutex at all, or is my assumption simply wrong? I'm assuming it is an arena mutex because these syscalls show up in Value::from_array, drop calls, etc.

Also, somewhat related, take a look at zoomed-in Session::run:

There's two Drop::drop calls, zooming-in on them:

Again, I'm not familiar with ONNX's internals, but arenas have to reset their chunk pointer at some point, and when new values are written, the old memory simply gets overwritten. As such, it makes sense (at least, in other cases I've used arenas), to avoid calling Drop at all. With that in mind, does it make sense to avoid calling ReleaseMemoryInfo/ReleaseValue at all, if the allocator is an arena? That could be a nice optimization

ort/src/memory.rs

Line 146 in d1ae982

ortsys![unsafe ReleaseMemoryInfo(self.ptr)];

ort/src/value.rs

Line 703 in d1ae982

ortsys![unsafe ReleaseValue(ptr)];

The text was updated successfully, but these errors were encountered:

gootorov · 2024-02-20T14:57:48Z

Though, it would be easy to create UB if MemoryInfo/Value isn't tied to arena's lifetime, and the memory in the arena gets overwritten

decahedron1 · 2024-02-20T15:24:53Z

Those futex calls are probably from ort as each call to an ONNX Runtime API would (needlessly) lock a Mutex. I removed the mutex in #160, does ort @ 04df44d help with the contention at all?

gootorov · 2024-02-20T20:25:13Z

You're right. I'm not seeing any futex calls in ort stacks anymore. Kernel load is now at ~1-1.5%

Thank you!

decahedron1 self-assigned this Feb 20, 2024

decahedron1 added perf p: high high priority os: linux v2 labels Feb 20, 2024

gootorov closed this as completed Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High contention when using arena allocators #162

High contention when using arena allocators #162

gootorov commented Feb 20, 2024 •

edited

Loading

gootorov commented Feb 20, 2024

decahedron1 commented Feb 20, 2024

gootorov commented Feb 20, 2024

High contention when using arena allocators #162

High contention when using arena allocators #162

Comments

gootorov commented Feb 20, 2024 • edited Loading

gootorov commented Feb 20, 2024

decahedron1 commented Feb 20, 2024

gootorov commented Feb 20, 2024

gootorov commented Feb 20, 2024 •

edited

Loading