Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor selecting default tuning for select_if #3124

Merged

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Dec 11, 2024

It seemed that the default tunings just tried to replicate what's in DefaulPolicy, so instead of dupliacting this information, let's try to avoid taking values from a tuning and use the default policy directly.

  • No changes in SASS in cub.test.device_select_if.lid_0

@bernhardmgruber bernhardmgruber changed the title Default select if tuning Refactor selecting default tuning for select_if Dec 11, 2024
@bernhardmgruber bernhardmgruber force-pushed the default_select_if_tuning branch from eb11547 to c113ddd Compare December 11, 2024 14:51
@bernhardmgruber bernhardmgruber marked this pull request as ready for review December 11, 2024 14:51
@bernhardmgruber bernhardmgruber requested review from a team as code owners December 11, 2024 14:51
@NVIDIA NVIDIA deleted a comment from copy-pr-bot bot Dec 11, 2024
Copy link
Contributor

🟩 CI finished in 1h 25m: Pass: 100%/94 | Total: 2d 13h | Avg: 39m 16s | Max: 1h 06m | Hits: 77%/12324
  • 🟩 thrust: Pass: 100%/46 | Total: 23h 17m | Avg: 30m 23s | Max: 59m 49s | Hits: 81%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total:  1h 04m | Avg: 32m 08s | Max: 36m 29s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 22h 18m | Avg: 30m 25s | Max: 59m 49s | Hits:  81%/9260  
      🟩 arm64              Pass: 100%/2   | Total: 59m 03s | Avg: 29m 31s | Max: 30m 19s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  3h 19m | Avg: 28m 34s | Max: 51m 44s | Hits:  77%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  1h 41m | Avg: 50m 31s | Max: 52m 40s
      🟩 12.6               Pass: 100%/37  | Total: 18h 16m | Avg: 29m 38s | Max: 59m 49s | Hits:  82%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 02s | Avg: 26m 31s | Max: 26m 39s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  3h 19m | Avg: 28m 34s | Max: 51m 44s | Hits:  77%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 41m | Avg: 50m 31s | Max: 52m 40s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 17h 23m | Avg: 29m 49s | Max: 59m 49s | Hits:  82%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 02s | Avg: 26m 31s | Max: 26m 39s
      🟩 nvcc               Pass: 100%/44  | Total: 22h 24m | Avg: 30m 33s | Max: 59m 49s | Hits:  81%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  1h 44m | Avg: 26m 04s | Max: 31m 31s
      🟩 Clang10            Pass: 100%/1   | Total: 33m 22s | Avg: 33m 22s | Max: 33m 22s
      🟩 Clang11            Pass: 100%/1   | Total: 27m 25s | Avg: 27m 25s | Max: 27m 25s
      🟩 Clang12            Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
      🟩 Clang13            Pass: 100%/1   | Total: 31m 00s | Avg: 31m 00s | Max: 31m 00s
      🟩 Clang14            Pass: 100%/1   | Total: 28m 45s | Avg: 28m 45s | Max: 28m 45s
      🟩 Clang15            Pass: 100%/1   | Total: 28m 18s | Avg: 28m 18s | Max: 28m 18s
      🟩 Clang16            Pass: 100%/1   | Total: 28m 15s | Avg: 28m 15s | Max: 28m 15s
      🟩 Clang17            Pass: 100%/1   | Total: 28m 06s | Avg: 28m 06s | Max: 28m 06s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 41m | Avg: 23m 01s | Max: 32m 08s
      🟩 GCC6               Pass: 100%/2   | Total: 50m 04s | Avg: 25m 02s | Max: 28m 41s
      🟩 GCC7               Pass: 100%/2   | Total: 50m 53s | Avg: 25m 26s | Max: 27m 30s
      🟩 GCC8               Pass: 100%/1   | Total: 32m 04s | Avg: 32m 04s | Max: 32m 04s
      🟩 GCC9               Pass: 100%/3   | Total:  1h 24m | Avg: 28m 08s | Max: 33m 12s
      🟩 GCC10              Pass: 100%/1   | Total: 32m 49s | Avg: 32m 49s | Max: 32m 49s
      🟩 GCC11              Pass: 100%/1   | Total: 31m 28s | Avg: 31m 28s | Max: 31m 28s
      🟩 GCC12              Pass: 100%/1   | Total: 31m 02s | Avg: 31m 02s | Max: 31m 02s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 27m | Avg: 25m 52s | Max: 36m 29s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 36m 26s | Avg: 36m 26s | Max: 36m 26s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 51m 44s | Avg: 51m 44s | Max: 51m 44s | Hits:  77%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 52m 18s | Avg: 52m 18s | Max: 52m 18s | Hits:  77%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 16m | Avg: 45m 39s | Max: 59m 49s | Hits:  84%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 41m | Avg: 50m 31s | Max: 52m 40s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  8h 19m | Avg: 26m 17s | Max: 33m 22s
      🟩 GCC                Pass: 100%/19  | Total:  8h 39m | Avg: 27m 21s | Max: 36m 29s
      🟩 Intel              Pass: 100%/1   | Total: 36m 26s | Avg: 36m 26s | Max: 36m 26s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 01m | Avg: 48m 12s | Max: 59m 49s | Hits:  81%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 41m | Avg: 50m 31s | Max: 52m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total: 23h 17m | Avg: 30m 23s | Max: 59m 49s | Hits:  81%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total: 21h 35m | Avg: 32m 23s | Max: 59m 49s | Hits:  77%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 36m 23s | Avg: 12m 07s | Max: 21m 35s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 05m | Avg: 21m 53s | Max: 36m 29s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 20m 14s | Avg: 20m 14s | Max: 20m 14s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  1h 54m | Avg: 22m 54s | Max: 25m 47s
      🟩 14                 Pass: 100%/4   | Total:  2h 19m | Avg: 34m 51s | Max: 51m 44s | Hits:  77%/1852  
      🟩 17                 Pass: 100%/12  | Total:  7h 17m | Avg: 36m 28s | Max: 55m 34s | Hits:  77%/3704  
      🟩 20                 Pass: 100%/23  | Total: 10h 42m | Avg: 27m 54s | Max: 59m 49s | Hits:  88%/3704  
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 13h | Avg: 50m 04s | Max: 1h 06m | Hits: 64%/3064

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 11h | Avg: 49m 44s | Max:  1h 06m | Hits:  64%/3064  
      🟩 arm64              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 15s | Max: 59m 01s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  5h 35m | Avg: 47m 56s | Max:  1h 02m | Hits:  64%/766   
      🟩 12.5               Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
      🟩 12.6               Pass: 100%/36  | Total:  1d 05h | Avg: 49m 52s | Max:  1h 06m | Hits:  64%/2298  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 59m | Avg: 59m 57s | Max:  1h 01m
      🟩 nvcc11.1           Pass: 100%/7   | Total:  5h 35m | Avg: 47m 56s | Max:  1h 02m | Hits:  64%/766   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
      🟩 nvcc12.6           Pass: 100%/34  | Total:  1d 03h | Avg: 49m 16s | Max:  1h 06m | Hits:  64%/2298  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 57s | Max:  1h 01m
      🟩 nvcc               Pass: 100%/43  | Total:  1d 11h | Avg: 49m 36s | Max:  1h 06m | Hits:  64%/3064  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  3h 16m | Avg: 49m 00s | Max: 51m 56s
      🟩 Clang10            Pass: 100%/1   | Total: 54m 06s | Avg: 54m 06s | Max: 54m 06s
      🟩 Clang11            Pass: 100%/1   | Total: 57m 16s | Avg: 57m 16s | Max: 57m 16s
      🟩 Clang12            Pass: 100%/1   | Total: 55m 30s | Avg: 55m 30s | Max: 55m 30s
      🟩 Clang13            Pass: 100%/1   | Total: 56m 19s | Avg: 56m 19s | Max: 56m 19s
      🟩 Clang14            Pass: 100%/1   | Total: 56m 56s | Avg: 56m 56s | Max: 56m 56s
      🟩 Clang15            Pass: 100%/1   | Total: 56m 09s | Avg: 56m 09s | Max: 56m 09s
      🟩 Clang16            Pass: 100%/1   | Total: 56m 50s | Avg: 56m 50s | Max: 56m 50s
      🟩 Clang17            Pass: 100%/1   | Total: 52m 44s | Avg: 52m 44s | Max: 52m 44s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 37m | Avg: 48m 10s | Max:  1h 01m
      🟩 GCC6               Pass: 100%/2   | Total:  1h 28m | Avg: 44m 03s | Max: 45m 17s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 49m | Avg: 54m 43s | Max: 55m 48s
      🟩 GCC8               Pass: 100%/1   | Total: 55m 19s | Avg: 55m 19s | Max: 55m 19s
      🟩 GCC9               Pass: 100%/3   | Total:  2h 22m | Avg: 47m 30s | Max: 51m 39s
      🟩 GCC10              Pass: 100%/1   | Total: 57m 53s | Avg: 57m 53s | Max: 57m 53s
      🟩 GCC11              Pass: 100%/1   | Total: 54m 28s | Avg: 54m 28s | Max: 54m 28s
      🟩 GCC12              Pass: 100%/1   | Total: 58m 03s | Avg: 58m 03s | Max: 58m 03s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 31m | Avg: 33m 54s | Max: 59m 01s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 59m 14s | Avg: 59m 14s | Max: 59m 14s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  64%/766   
      🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m | Hits:  64%/766   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m | Hits:  64%/1532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 16h 19m | Avg: 51m 31s | Max:  1h 01m
      🟩 GCC                Pass: 100%/19  | Total: 13h 57m | Avg: 44m 03s | Max: 59m 01s
      🟩 Intel              Pass: 100%/1   | Total: 59m 14s | Avg: 59m 14s | Max: 59m 14s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 15m | Avg:  1h 03m | Max:  1h 06m | Hits:  64%/3064  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
    🟩 gpu
      🟩 v100               Pass: 100%/45  | Total:  1d 13h | Avg: 50m 04s | Max:  1h 06m | Hits:  64%/3064  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  1d 11h | Avg: 54m 10s | Max:  1h 06m | Hits:  64%/3064  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 50s | Avg: 22m 50s | Max: 22m 50s
      🟩 GraphCapture       Pass: 100%/1   | Total: 19m 09s | Avg: 19m 09s | Max: 19m 09s
      🟩 HostLaunch         Pass: 100%/2   | Total: 44m 42s | Avg: 22m 21s | Max: 25m 16s
      🟩 TestGPU            Pass: 100%/2   | Total: 53m 53s | Avg: 26m 56s | Max: 32m 09s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 23m 39s | Avg: 23m 39s | Max: 23m 39s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  4h 00m | Avg: 48m 03s | Max: 55m 48s
      🟩 14                 Pass: 100%/4   | Total:  3h 32m | Avg: 53m 13s | Max:  1h 02m | Hits:  64%/766   
      🟩 17                 Pass: 100%/12  | Total: 11h 10m | Avg: 55m 52s | Max:  1h 06m | Hits:  64%/1532  
      🟩 20                 Pass: 100%/24  | Total: 18h 49m | Avg: 47m 03s | Max:  1h 04m | Hits:  64%/766   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 32s | Avg: 5m 16s | Max: 8m 06s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 06s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 06s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 06s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 06s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 06s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 06s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  8m 06s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 26s | Avg:  2m 26s | Max:  2m 26s
      🟩 Test               Pass: 100%/1   | Total:  8m 06s | Avg:  8m 06s | Max:  8m 06s
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 51s | Avg: 30m 51s | Max: 30m 51s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 94)

# Runner
70 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16

@bernhardmgruber bernhardmgruber merged commit 346a618 into NVIDIA:main Dec 11, 2024
110 checks passed
@bernhardmgruber bernhardmgruber deleted the default_select_if_tuning branch December 11, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants