Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor scan_by_key tuning #3139

Merged
merged 10 commits into from
Dec 12, 2024

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Dec 12, 2024

  • No SASS changes in cub.test.device_scan_by_key.lid_0.types_0 except for kernel symbol names

@bernhardmgruber bernhardmgruber changed the title Ref scan by key tuning Refactor scan_by_key tuning Dec 12, 2024
Copy link
Contributor

🟩 CI finished in 1h 47m: Pass: 100%/94 | Total: 2d 14h | Avg: 39m 34s | Max: 1h 09m | Hits: 74%/12324
  • 🟩 thrust: Pass: 100%/46 | Total: 1d 00h | Avg: 31m 28s | Max: 1h 04m | Hits: 76%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 44s | Avg: 20m 52s | Max: 29m 02s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 23h 04m | Avg: 31m 28s | Max:  1h 04m | Hits:  76%/9260  
      🟩 arm64              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 19s | Max: 34m 52s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  3h 32m | Avg: 30m 24s | Max: 58m 44s | Hits:  71%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  1h 40m | Avg: 50m 04s | Max: 52m 07s
      🟩 12.6               Pass: 100%/37  | Total: 18h 54m | Avg: 30m 39s | Max:  1h 04m | Hits:  78%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 55m 36s | Avg: 27m 48s | Max: 29m 35s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  3h 32m | Avg: 30m 24s | Max: 58m 44s | Hits:  71%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 40m | Avg: 50m 04s | Max: 52m 07s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 17h 58m | Avg: 30m 49s | Max:  1h 04m | Hits:  78%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 55m 36s | Avg: 27m 48s | Max: 29m 35s
      🟩 nvcc               Pass: 100%/44  | Total: 23h 11m | Avg: 31m 38s | Max:  1h 04m | Hits:  76%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  1h 44m | Avg: 26m 13s | Max: 30m 38s
      🟩 Clang10            Pass: 100%/1   | Total: 35m 12s | Avg: 35m 12s | Max: 35m 12s
      🟩 Clang11            Pass: 100%/1   | Total: 29m 59s | Avg: 29m 59s | Max: 29m 59s
      🟩 Clang12            Pass: 100%/1   | Total: 28m 42s | Avg: 28m 42s | Max: 28m 42s
      🟩 Clang13            Pass: 100%/1   | Total: 31m 06s | Avg: 31m 06s | Max: 31m 06s
      🟩 Clang14            Pass: 100%/1   | Total: 35m 26s | Avg: 35m 26s | Max: 35m 26s
      🟩 Clang15            Pass: 100%/1   | Total: 32m 16s | Avg: 32m 16s | Max: 32m 16s
      🟩 Clang16            Pass: 100%/1   | Total: 29m 45s | Avg: 29m 45s | Max: 29m 45s
      🟩 Clang17            Pass: 100%/1   | Total: 34m 29s | Avg: 34m 29s | Max: 34m 29s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 48m | Avg: 24m 05s | Max: 32m 50s
      🟩 GCC6               Pass: 100%/2   | Total: 50m 15s | Avg: 25m 07s | Max: 27m 35s
      🟩 GCC7               Pass: 100%/2   | Total: 56m 26s | Avg: 28m 13s | Max: 32m 07s
      🟩 GCC8               Pass: 100%/1   | Total: 29m 52s | Avg: 29m 52s | Max: 29m 52s
      🟩 GCC9               Pass: 100%/3   | Total:  1h 27m | Avg: 29m 08s | Max: 33m 22s
      🟩 GCC10              Pass: 100%/1   | Total: 33m 00s | Avg: 33m 00s | Max: 33m 00s
      🟩 GCC11              Pass: 100%/1   | Total: 36m 16s | Avg: 36m 16s | Max: 36m 16s
      🟩 GCC12              Pass: 100%/1   | Total: 36m 26s | Avg: 36m 26s | Max: 36m 26s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 06m | Avg: 23m 16s | Max: 35m 57s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 58m 44s | Avg: 58m 44s | Max: 58m 44s | Hits:  71%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 57m 45s | Avg: 57m 45s | Max: 57m 45s | Hits:  71%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 27m | Avg: 49m 03s | Max:  1h 04m | Hits:  80%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 40m | Avg: 50m 04s | Max: 52m 07s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  8h 50m | Avg: 27m 55s | Max: 35m 26s
      🟩 GCC                Pass: 100%/19  | Total:  8h 35m | Avg: 27m 09s | Max: 36m 26s
      🟩 Intel              Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 23m | Avg: 52m 43s | Max:  1h 04m | Hits:  76%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 40m | Avg: 50m 04s | Max: 52m 07s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  1d 00h | Avg: 31m 28s | Max:  1h 04m | Hits:  76%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total: 22h 52m | Avg: 34m 18s | Max:  1h 04m | Hits:  71%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 37m 46s | Avg: 12m 35s | Max: 22m 10s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 37m 44s | Avg: 12m 34s | Max: 13m 16s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 20m 48s | Avg: 20m 48s | Max: 20m 48s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  1h 57m | Avg: 23m 28s | Max: 24m 28s
      🟩 14                 Pass: 100%/4   | Total:  2h 29m | Avg: 37m 16s | Max: 58m 44s | Hits:  71%/1852  
      🟩 17                 Pass: 100%/12  | Total:  7h 35m | Avg: 37m 57s | Max:  1h 00m | Hits:  71%/3704  
      🟩 20                 Pass: 100%/23  | Total: 11h 23m | Avg: 29m 43s | Max:  1h 04m | Hits:  85%/3704  
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 13h | Avg: 49m 27s | Max: 1h 09m | Hits: 65%/3064

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 11h | Avg: 48m 58s | Max:  1h 09m | Hits:  65%/3064  
      🟩 arm64              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 40s | Max:  1h 04m
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  5h 42m | Avg: 48m 55s | Max: 56m 53s | Hits:  65%/766   
      🟩 12.5               Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m
      🟩 12.6               Pass: 100%/36  | Total:  1d 05h | Avg: 48m 48s | Max:  1h 09m | Hits:  65%/2298  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 53m | Avg: 56m 34s | Max: 57m 06s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  5h 42m | Avg: 48m 55s | Max: 56m 53s | Hits:  65%/766   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m
      🟩 nvcc12.6           Pass: 100%/34  | Total:  1d 03h | Avg: 48m 21s | Max:  1h 09m | Hits:  65%/2298  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 34s | Max: 57m 06s
      🟩 nvcc               Pass: 100%/43  | Total:  1d 11h | Avg: 49m 07s | Max:  1h 09m | Hits:  65%/3064  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  3h 25m | Avg: 51m 19s | Max: 57m 55s
      🟩 Clang10            Pass: 100%/1   | Total: 58m 35s | Avg: 58m 35s | Max: 58m 35s
      🟩 Clang11            Pass: 100%/1   | Total: 51m 28s | Avg: 51m 28s | Max: 51m 28s
      🟩 Clang12            Pass: 100%/1   | Total: 53m 55s | Avg: 53m 55s | Max: 53m 55s
      🟩 Clang13            Pass: 100%/1   | Total: 55m 48s | Avg: 55m 48s | Max: 55m 48s
      🟩 Clang14            Pass: 100%/1   | Total: 52m 48s | Avg: 52m 48s | Max: 52m 48s
      🟩 Clang15            Pass: 100%/1   | Total: 51m 46s | Avg: 51m 46s | Max: 51m 46s
      🟩 Clang16            Pass: 100%/1   | Total: 51m 02s | Avg: 51m 02s | Max: 51m 02s
      🟩 Clang17            Pass: 100%/1   | Total: 52m 46s | Avg: 52m 46s | Max: 52m 46s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 19m | Avg: 45m 42s | Max: 57m 06s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 32m | Avg: 46m 22s | Max: 49m 36s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 09s | Max: 56m 19s
      🟩 GCC8               Pass: 100%/1   | Total: 49m 46s | Avg: 49m 46s | Max: 49m 46s
      🟩 GCC9               Pass: 100%/3   | Total:  2h 27m | Avg: 49m 08s | Max: 50m 49s
      🟩 GCC10              Pass: 100%/1   | Total: 52m 51s | Avg: 52m 51s | Max: 52m 51s
      🟩 GCC11              Pass: 100%/1   | Total: 53m 45s | Avg: 53m 45s | Max: 53m 45s
      🟩 GCC12              Pass: 100%/1   | Total: 54m 18s | Avg: 54m 18s | Max: 54m 18s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 33m | Avg: 34m 14s | Max:  1h 04m
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 58m 59s | Avg: 58m 59s | Max: 58m 59s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 53s | Avg: 56m 53s | Max: 56m 53s | Hits:  65%/766   
      🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:  65%/766   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m | Hits:  65%/1532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 15h 53m | Avg: 50m 10s | Max: 58m 35s
      🟩 GCC                Pass: 100%/19  | Total: 13h 53m | Avg: 43m 50s | Max:  1h 04m
      🟩 Intel              Pass: 100%/1   | Total: 58m 59s | Avg: 58m 59s | Max: 58m 59s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 14m | Avg:  1h 03m | Max:  1h 09m | Hits:  65%/3064  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m
    🟩 gpu
      🟩 v100               Pass: 100%/45  | Total:  1d 13h | Avg: 49m 27s | Max:  1h 09m | Hits:  65%/3064  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  1d 11h | Avg: 53m 57s | Max:  1h 09m | Hits:  65%/3064  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 14s | Avg: 21m 14s | Max: 21m 14s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 34s | Avg: 15m 34s | Max: 15m 34s
      🟩 HostLaunch         Pass: 100%/2   | Total: 35m 06s | Avg: 17m 33s | Max: 19m 11s
      🟩 TestGPU            Pass: 100%/2   | Total: 49m 14s | Avg: 24m 37s | Max: 28m 59s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 25m 32s | Avg: 25m 32s | Max: 25m 32s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  4h 00m | Avg: 48m 02s | Max: 52m 00s
      🟩 14                 Pass: 100%/4   | Total:  3h 40m | Avg: 55m 10s | Max: 57m 55s | Hits:  65%/766   
      🟩 17                 Pass: 100%/12  | Total: 11h 13m | Avg: 56m 06s | Max:  1h 06m | Hits:  65%/1532  
      🟩 20                 Pass: 100%/24  | Total: 18h 11m | Avg: 45m 28s | Max:  1h 09m | Hits:  65%/766   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 46s | Avg: 4m 53s | Max: 7m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  7m 40s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  7m 40s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  7m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  7m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  7m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  7m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  7m 40s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
      🟩 Test               Pass: 100%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s
    
  • 🟩 python: Pass: 100%/1 | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 37m 24s | Avg: 37m 24s | Max: 37m 24s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 94)

# Runner
70 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16

@bernhardmgruber bernhardmgruber merged commit 7321a51 into NVIDIA:main Dec 12, 2024
114 checks passed
@bernhardmgruber bernhardmgruber deleted the ref_scan_by_key_tuning branch December 12, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants