How could I get a reasonable upper bound on the number of threads that can effectively run concurrently for a given device ? #1033

fcharras · 2023-01-09T09:50:51Z

The closer that we can get in dpctl seems to be max_compute_units but that does not seem to do the job, online documentation seems to suggest that the number of threads per compute unit can depend on the hardware and is not necessarily related to the sub group size.

While it seems that an exact value is impossible to have for reasons related to hardware architecture, at least an upper bound sound achievable ?

I'm looking for such upper bound to have a closer estimate to the quantity of global memory cache that would be required by some kernels that rely on caching to ensure maximum cache hit rate during execution.

The text was updated successfully, but these errors were encountered:

oleksandr-pavlyk · 2023-01-22T20:23:57Z

Please take a look at overview Xe architecture in Inte's GPU optimization guide.

A reasonable bound can be (threads per compute unit) * (maximal work-groups size). The latter is accessible in dpctl via dpctl.SyclDevice.max_work_group_size. You can bound the former from above by value 8 (see architectural summary table in the reference Xe architecture table).

fcharras · 2023-01-23T10:29:43Z

Wouldn't it be (threads per compute unit) * (number of compute units), where the former is indeed 8 as the optimization guide shows, but the latter is number of compute units is dpctl.SyclDevice.max_compute_units ? With the compute units meaning XVE for Xe architecture ? It seems to fit better the thread count number given in the summary array.

Could there be plans for either exposing max_threads_per_compute_units or max_thread_counts in future SYCL specs and/or in dpctl ?

oleksandr-pavlyk · 2023-01-23T13:04:40Z

the latter is number of compute units is dpctl.SyclDevice.max_compute_units

I sorry, you are definitely right. I was all wrapped up in the notion that kernels are launched by work-groups, but it does mean that several concurrent work-groups cannot be launched in different cores.

diptorupd added the user label Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How could I get a reasonable upper bound on the number of threads that can effectively run concurrently for a given device ? #1033

How could I get a reasonable upper bound on the number of threads that can effectively run concurrently for a given device ? #1033

fcharras commented Jan 9, 2023

oleksandr-pavlyk commented Jan 22, 2023

fcharras commented Jan 23, 2023

oleksandr-pavlyk commented Jan 23, 2023

How could I get a reasonable upper bound on the number of threads that can effectively run concurrently for a given device ? #1033

How could I get a reasonable upper bound on the number of threads that can effectively run concurrently for a given device ? #1033

Comments

fcharras commented Jan 9, 2023

oleksandr-pavlyk commented Jan 22, 2023

fcharras commented Jan 23, 2023

oleksandr-pavlyk commented Jan 23, 2023