Use number of physical not logical cores for auto nthreads? #43692

jtrakk · 2022-01-06T23:18:25Z

In 1.7 auto uses hyperthreading. For some workloads, using physical cores without hyperthreading may be faster. I'm not sure which workloads these are (does anybody have a reference with benchmarks?), but if Julia users would rather not use hyperthreading by default, auto should do that.

If Julia doesn't have access to the number of physical cores without Hwloc.jl, it could use a heuristic like ceil(jl_cpu_threads/2).

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2022-01-07T17:19:35Z

I agree. My understanding is that determining the number of physical cores is surprisingly difficult, which is the main reason we haven't done this yet.

giordano · 2022-01-07T17:30:31Z

Related: JuliaLang/LinearAlgebra.jl#671. This is about BLAS, but the main problem is the same: determining the number of physical cores, without pulling in yet another external dependency.

oscardssmith · 2022-01-07T18:40:41Z

To make matters worse, we now officially do need to care about big/little designs since M1 and 12 gen intel use it.

ViralBShah · 2022-01-10T08:15:01Z

It may very well be the case that we should ship hwloc with Julia.

DilumAluthge · 2022-01-11T06:30:46Z

How big is Hwloc_jll?

Also, is it generally the same size on all operating systems and architectures?

ViralBShah · 2022-01-11T06:56:25Z

It's small - 2-4MB.

https://github.com/JuliaBinaryWrappers/Hwloc_jll.jl/releases/tag/Hwloc-v2.7.0%2B0

DilumAluthge · 2022-01-11T21:30:52Z

That's not too bad.

DilumAluthge · 2022-01-11T21:31:52Z

Are there other things that we can use hwloc for, besides just counting the number of physical cores? The more uses we can get out of hwloc, the more compelling the argument is for shipping hwloc with Julia.

vchuravy · 2022-01-14T04:21:29Z

Yeah there are other interesting things one can do with HWloc, but not sure how much they matter for Base.

One word of caution is that HWloc is being used by quite a few JLLs and we should use it as a static lib to avoid pinning the entire ecosystem to one version

tkf · 2022-01-14T06:25:51Z

besides just counting the number of physical cores

We can also query the cache hierarchy and NUMA nodes. In principle, we can have a more "intelligent" scheduler that uses knowledge like this. I'd guess the allocator/GC can do something interesting too.

chriselrod · 2022-01-14T14:43:42Z

We can also query the cache hierarchy

LLVM should provide this, too:
https://github.com/llvm/llvm-project/blob/0af1808f9b99b49b87b8503466110baee42c5aea/llvm/include/llvm/Analysis/TargetTransformInfo.h#L2124-L2130

vchuravy · 2022-01-14T17:21:53Z

If anyone wants to play around with hwloc I setup a minimal deps file to get a static library in https://github.com/JuliaLang/julia/tree/vc/hwloc

tkf · 2022-01-15T00:20:14Z

LLVM should provide this

Oh, this is cool. I didn't know this. But I was thinking more detailed info like which cores share which L3 (not all CPUs do this per-socket basis). Also, since we've decoupled codegen and runtime as separate libraries, I don't think we want to use LLVM in runtime.

DilumAluthge · 2022-01-15T00:22:53Z

If anyone wants to play around with hwloc I setup a minimal deps file to get a static library in https://github.com/JuliaLang/julia/tree/vc/hwloc

Are you building hwloc from source here?

Is there a way that we can use the pre-built Ygg binaries for hwloc, but not interfere with the Hwloc_jll package? Because as you mentioned above, we don't want to force everyone to use the same version of Hwloc_jll (the way that the stdlib JLLs currently do).

Maybe that's a question for @staticfloat

giordano · 2022-01-15T00:27:56Z

Is there a way that we can use the pre-built Ygg binaries for hwloc, but not interfere with the Hwloc_jll package?

I'm not sure Hwloc_jll builds a static library (can't check now, away from computer).

staticfloat · 2022-01-15T01:06:22Z

Hwloc_jll does not build a static library right now. But we could, and could then download/extract it and use it just like everything else.

That being said, if there were a way to get equivalent information from LLVM, I'd definitely prefer that, as 1.7MB (the size of the .so) is still a hefty price to pay for this functionality. On macOS at least, you can get this kind of information via a few sysctl's, I'd hope that Linux/Windows don't make it too much worse.

tkf · 2022-01-15T02:02:12Z

Given that there are some efforts for parsing cgroup in libuv libuv/libuv#2323, I don't think it's crazy to parse cgroup and proc in libuv to get the core count information in Linux. I don't know about Windows though. I also don't know what kind of other magics Hwloc has.

That being said, if there were a way to get equivalent information from LLVM,

The methods @chriselrod linked queries cache size, cache line size and cache associativity. It doesn't look like it has a way to get the core count. I'd guess core count is less useful to LLVM (can it specialize to the number of CPUs?) although I wonder how much of OpenMP stuff is in libLLVM we ship? But I guess it's the runtime's job to count CPUs? cc @vchuravy

I also note that #42340 partially solves this for "sufficiently well-behaving" environments that set up affinity for each job allocation; e.g., HPC clusters and cloud services. The point is that we can't determine the number of CPUs we should use solely from the hardware information. We need to respect how much computing resources are assigned to a julia process by an "outer scheduler" (whatever spawns julia process). Of course, something like Hwloc is nice to have for making it work automatically for manually managed workstations and laptops.

gbaraldi · 2022-10-05T15:01:02Z

In the multithreading meeting we discussed that adding the Hwloc dep might be useful. The cpu detection code has been getting more and more gnarly with efficiency cores becoming more common and etc. The apple-aarch64 code is already quite messy. I don't think LLVM has this information in an easy API, it cares more about what kind of core is there than how many cores are there.

StefanKarpinski · 2022-10-26T17:31:16Z

I do think it makes sense to expose this info programmatically, but I'm not so sure that we should default to physical cores. It seems that Julia's threading actually does pretty well with hyperthreading, which OpenMP and BLAS do not. What might make sense is for Julia to default to the number of logical cores and BLAS to default to the number of physical cores.

JeffBezanson added the multithreading Base.Threads and related functionality label Jan 7, 2022

vtjnash added the speculative Whether the change will be implemented is speculative label Jan 7, 2022

ViralBShah mentioned this issue Jan 14, 2022

Start julia with multiple worker threads by default #43672

Open

kpamnany mentioned this issue Apr 14, 2022

Add threadpool support to runtime #42302

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use number of physical not logical cores for auto nthreads? #43692

Use number of physical not logical cores for auto nthreads? #43692

jtrakk commented Jan 6, 2022

JeffBezanson commented Jan 7, 2022

giordano commented Jan 7, 2022

oscardssmith commented Jan 7, 2022

ViralBShah commented Jan 10, 2022

DilumAluthge commented Jan 11, 2022

ViralBShah commented Jan 11, 2022 •

edited

Loading

DilumAluthge commented Jan 11, 2022

DilumAluthge commented Jan 11, 2022 •

edited

Loading

vchuravy commented Jan 14, 2022

tkf commented Jan 14, 2022

chriselrod commented Jan 14, 2022

vchuravy commented Jan 14, 2022

tkf commented Jan 15, 2022

DilumAluthge commented Jan 15, 2022

giordano commented Jan 15, 2022

staticfloat commented Jan 15, 2022

tkf commented Jan 15, 2022 •

edited

Loading

gbaraldi commented Oct 5, 2022

StefanKarpinski commented Oct 26, 2022

Use number of physical not logical cores for auto nthreads? #43692

Use number of physical not logical cores for auto nthreads? #43692

Comments

jtrakk commented Jan 6, 2022

JeffBezanson commented Jan 7, 2022

giordano commented Jan 7, 2022

oscardssmith commented Jan 7, 2022

ViralBShah commented Jan 10, 2022

DilumAluthge commented Jan 11, 2022

ViralBShah commented Jan 11, 2022 • edited Loading

DilumAluthge commented Jan 11, 2022

DilumAluthge commented Jan 11, 2022 • edited Loading

vchuravy commented Jan 14, 2022

tkf commented Jan 14, 2022

chriselrod commented Jan 14, 2022

vchuravy commented Jan 14, 2022

tkf commented Jan 15, 2022

DilumAluthge commented Jan 15, 2022

giordano commented Jan 15, 2022

staticfloat commented Jan 15, 2022

tkf commented Jan 15, 2022 • edited Loading

gbaraldi commented Oct 5, 2022

StefanKarpinski commented Oct 26, 2022

ViralBShah commented Jan 11, 2022 •

edited

Loading

DilumAluthge commented Jan 11, 2022 •

edited

Loading

tkf commented Jan 15, 2022 •

edited

Loading