Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _CCCL_BUILTIN_PREFETCH #3433

Merged
merged 6 commits into from
Jan 21, 2025
Merged

Add _CCCL_BUILTIN_PREFETCH #3433

merged 6 commits into from
Jan 21, 2025

Conversation

fbusato
Copy link
Contributor

@fbusato fbusato commented Jan 17, 2025

Description

Portable __builtin_prefetch. Ideally used by mdspan accessor with properties

@fbusato fbusato added the 3.0 Targeted for 3.0 release label Jan 17, 2025
@fbusato fbusato self-assigned this Jan 17, 2025
@fbusato fbusato requested review from a team as code owners January 17, 2025 01:00
@fbusato fbusato requested a review from wmaxey January 17, 2025 01:00
Copy link
Contributor

🟩 CI finished in 2h 02m: Pass: 100%/144 | Total: 1d 13h | Avg: 15m 42s | Max: 1h 15m | Hits: 244%/25759
  • 🟩 libcudacxx: Pass: 100%/46 | Total: 11h 44m | Avg: 15m 19s | Max: 43m 59s | Hits: 382%/12477

    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 11h 20m | Avg: 15m 27s | Max: 43m 59s | Hits: 382%/12477 
      🟩 arm64              Pass: 100%/2   | Total: 24m 35s | Avg: 12m 17s | Max: 21m 15s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  2h 13m | Avg: 16m 38s | Max: 32m 51s | Hits: 367%/4871  
      🟩 12.5               Pass: 100%/2   | Total:  1h 10m | Avg: 35m 22s | Max: 37m 26s
      🟩 12.6               Pass: 100%/36  | Total:  8h 21m | Avg: 13m 55s | Max: 43m 59s | Hits: 391%/7606  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 06m | Avg: 16m 37s | Max: 21m 48s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  2h 13m | Avg: 16m 38s | Max: 32m 51s | Hits: 367%/4871  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 10m | Avg: 35m 22s | Max: 37m 26s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  7h 14m | Avg: 13m 34s | Max: 43m 59s | Hits: 391%/7606  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 06m | Avg: 16m 37s | Max: 21m 48s
      🟩 nvcc               Pass: 100%/42  | Total: 10h 38m | Avg: 15m 12s | Max: 43m 59s | Hits: 382%/12477 
    🟩 cxx
      🟩 Clang14            Pass: 100%/6   | Total: 36m 49s | Avg:  6m 08s | Max: 14m 18s
      🟩 Clang15            Pass: 100%/1   | Total: 22m 06s | Avg: 22m 06s | Max: 22m 06s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 35s | Avg:  5m 35s | Max:  5m 35s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s
      🟩 Clang18            Pass: 100%/8   | Total:  2h 04m | Avg: 15m 36s | Max: 28m 53s
      🟩 GCC7               Pass: 100%/5   | Total: 54m 56s | Avg: 10m 59s | Max: 18m 28s
      🟩 GCC8               Pass: 100%/1   | Total:  4m 42s | Avg:  4m 42s | Max:  4m 42s
      🟩 GCC9               Pass: 100%/3   | Total: 21m 46s | Avg:  7m 15s | Max: 13m 54s
      🟩 GCC10              Pass: 100%/1   | Total: 20m 49s | Avg: 20m 49s | Max: 20m 49s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 15s | Avg:  4m 15s | Max:  4m 15s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 28m | Avg: 14m 53s | Max: 28m 46s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 41m | Avg: 33m 59s | Max: 36m 52s | Hits: 371%/7357  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 19m | Avg: 39m 39s | Max: 43m 59s | Hits: 397%/5120  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 10m | Avg: 35m 22s | Max: 37m 26s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  3h 13m | Avg: 11m 23s | Max: 28m 53s
      🟩 GCC                Pass: 100%/22  | Total:  4h 19m | Avg: 11m 47s | Max: 28m 46s
      🟩 MSVC               Pass: 100%/5   | Total:  3h 01m | Avg: 36m 15s | Max: 43m 59s | Hits: 382%/12477 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 22s | Max: 37m 26s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total: 11h 44m | Avg: 15m 19s | Max: 43m 59s | Hits: 382%/12477 
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  9h 02m | Avg: 13m 53s | Max: 43m 59s | Hits: 382%/12477 
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 43m | Avg: 25m 48s | Max: 27m 11s
      🟩 Test               Pass: 100%/2   | Total: 57m 39s | Avg: 28m 49s | Max: 28m 53s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 13m 23s | Avg: 13m 23s | Max: 13m 23s
      🟩 90a                Pass: 100%/2   | Total: 17m 05s | Avg:  8m 32s | Max: 13m 28s
    🟩 std
      🟩 11                 Pass: 100%/6   | Total:  1h 25m | Avg: 14m 11s | Max: 23m 33s
      🟩 14                 Pass: 100%/4   | Total:  1h 07m | Avg: 16m 46s | Max: 32m 51s | Hits: 334%/2395  
      🟩 17                 Pass: 100%/14  | Total:  3h 50m | Avg: 16m 28s | Max: 36m 52s | Hits: 391%/7448  
      🟩 20                 Pass: 100%/21  | Total:  5h 19m | Avg: 15m 14s | Max: 43m 59s | Hits: 398%/2634  
    
  • 🟩 cub: Pass: 100%/38 | Total: 13h 05m | Avg: 20m 40s | Max: 1h 15m | Hits: 38%/3540

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total: 12h 55m | Avg: 21m 32s | Max:  1h 15m | Hits:  38%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 03s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 33m | Avg: 18m 40s | Max:  1h 12m | Hits:  38%/885   
      🟩 12.5               Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 13m
      🟩 12.6               Pass: 100%/31  | Total:  9h 07m | Avg: 17m 38s | Max:  1h 15m | Hits:  38%/2655  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 38s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 33m | Avg: 18m 40s | Max:  1h 12m | Hits:  38%/885   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 13m
      🟩 nvcc12.6           Pass: 100%/29  | Total:  8h 58m | Avg: 18m 33s | Max:  1h 15m | Hits:  38%/2655  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 38s
      🟩 nvcc               Pass: 100%/36  | Total: 12h 56m | Avg: 21m 34s | Max:  1h 15m | Hits:  38%/3540  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 51s | Avg:  5m 27s | Max:  5m 44s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 58m | Avg: 16m 57s | Max: 50m 24s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 49s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 50s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 01s | Avg:  6m 01s | Max:  6m 01s
      🟩 GCC12              Pass: 100%/3   | Total: 29m 36s | Avg:  9m 52s | Max: 19m 31s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 05m | Avg: 15m 44s | Max: 36m 50s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 12m | Hits:  38%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 15m | Hits:  38%/1770  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 13m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  2h 37m | Avg: 11m 13s | Max: 50m 24s
      🟩 GCC                Pass: 100%/18  | Total:  3h 14m | Avg: 10m 48s | Max: 36m 50s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 48m | Avg:  1h 12m | Max:  1h 15m | Hits:  38%/3540  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 13m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 23m 37s | Avg: 11m 48s | Max: 19m 31s
      🟩 v100               Pass: 100%/36  | Total: 12h 41m | Avg: 21m 09s | Max:  1h 15m | Hits:  38%/3540  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  9h 27m | Avg: 18m 17s | Max:  1h 15m | Hits:  38%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 36m 50s | Avg: 36m 50s | Max: 36m 50s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 38s | Avg: 15m 38s | Max: 15m 38s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 23m | Avg: 27m 45s | Max: 43m 25s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 22m | Avg: 41m 23s | Max: 50m 24s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 23m 37s | Avg: 11m 48s | Max: 19m 31s
      🟩 90a                Pass: 100%/1   | Total:  4m 21s | Avg:  4m 21s | Max:  4m 21s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  5h 39m | Avg: 24m 13s | Max:  1h 13m | Hits:  38%/2655  
      🟩 20                 Pass: 100%/24  | Total:  7h 26m | Avg: 18m 36s | Max:  1h 15m | Hits:  38%/885   
    
  • 🟩 thrust: Pass: 100%/37 | Total: 10h 20m | Avg: 16m 45s | Max: 1h 12m | Hits: 145%/9220

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 23m 53s | Avg: 11m 56s | Max: 17m 49s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total: 10h 10m | Avg: 17m 26s | Max:  1h 12m | Hits: 145%/9220  
      🟩 arm64              Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 14s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 16m | Avg: 15m 17s | Max: 56m 14s | Hits:  80%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 12m
      🟩 12.6               Pass: 100%/30  | Total:  6h 43m | Avg: 13m 26s | Max:  1h 11m | Hits: 162%/7376  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 12s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 16m | Avg: 15m 17s | Max: 56m 14s | Hits:  80%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 12m
      🟩 nvcc12.6           Pass: 100%/28  | Total:  6h 33m | Avg: 14m 02s | Max:  1h 11m | Hits: 162%/7376  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  5m 12s
      🟩 nvcc               Pass: 100%/35  | Total: 10h 10m | Avg: 17m 25s | Max:  1h 12m | Hits: 145%/9220  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 08s | Avg:  5m 17s | Max:  5m 39s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 Clang17            Pass: 100%/1   | Total:  6m 00s | Avg:  6m 00s | Max:  6m 00s
      🟩 Clang18            Pass: 100%/7   | Total: 45m 32s | Avg:  6m 30s | Max: 11m 19s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 06s | Avg:  5m 03s | Max:  5m 13s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 12s | Avg:  5m 36s | Max:  5m 44s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 53s | Avg:  5m 53s | Max:  5m 53s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 58s | Avg:  5m 58s | Max:  5m 58s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 11m | Avg:  8m 58s | Max: 18m 38s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 00s | Max: 57m 46s | Hits: 101%/3688  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 45m | Avg: 55m 18s | Max:  1h 11m | Hits: 175%/5532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 12m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  1h 23m | Avg:  5m 59s | Max: 11m 19s
      🟩 GCC                Pass: 100%/16  | Total:  1h 55m | Avg:  7m 14s | Max: 18m 38s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 39m | Avg: 55m 58s | Max:  1h 11m | Hits: 145%/9220  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 12m
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total: 10h 20m | Avg: 16m 45s | Max:  1h 12m | Hits: 145%/9220  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  8h 40m | Avg: 16m 47s | Max:  1h 12m | Hits:  91%/7376  
      🟩 TestCPU            Pass: 100%/3   | Total: 52m 00s | Avg: 17m 20s | Max: 36m 36s | Hits: 365%/1844  
      🟩 TestGPU            Pass: 100%/3   | Total: 47m 46s | Avg: 15m 55s | Max: 18m 38s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 52s | Avg:  4m 52s | Max:  4m 52s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  4h 53m | Avg: 20m 57s | Max:  1h 08m | Hits:  94%/5532  
      🟩 20                 Pass: 100%/21  | Total:  5h 02m | Avg: 14m 25s | Max:  1h 12m | Hits: 222%/3688  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 55m | Avg: 5m 45s | Max: 14m 56s | Hits: 81%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 44m | Avg:  6m 32s | Max: 14m 56s | Hits:  81%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 25s | Avg:  2m 36s | Max:  2m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 12m 30s | Avg: 12m 30s | Max: 12m 30s | Hits:  81%/261   
      🟩 12.5               Pass: 100%/2   | Total: 17m 34s | Avg:  8m 47s | Max:  8m 57s
      🟩 12.6               Pass: 100%/17  | Total:  1h 25m | Avg:  5m 00s | Max: 14m 56s | Hits:  81%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 30s | Avg: 12m 30s | Max: 12m 30s | Hits:  81%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 34s | Avg:  8m 47s | Max:  8m 57s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 25m | Avg:  5m 00s | Max: 14m 56s | Hits:  81%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 55m | Avg:  5m 45s | Max: 14m 56s | Hits:  81%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 09s | Avg:  3m 09s | Max:  3m 09s
      🟩 Clang18            Pass: 100%/4   | Total: 23m 38s | Avg:  5m 54s | Max: 14m 54s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s
      🟩 GCC12              Pass: 100%/2   | Total: 18m 00s | Avg:  9m 00s | Max: 14m 56s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 38s | Avg:  2m 39s | Max:  2m 50s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 30s | Avg: 12m 30s | Max: 12m 30s | Hits:  81%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 05s | Avg: 13m 05s | Max: 13m 05s | Hits:  81%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 34s | Avg:  8m 47s | Max:  8m 57s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 36m 44s | Avg:  4m 35s | Max: 14m 54s
      🟩 GCC                Pass: 100%/8   | Total: 35m 14s | Avg:  4m 24s | Max: 14m 56s
      🟩 MSVC               Pass: 100%/2   | Total: 25m 35s | Avg: 12m 47s | Max: 13m 05s | Hits:  81%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 34s | Avg:  8m 47s | Max:  8m 57s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  1h 55m | Avg:  5m 45s | Max: 14m 56s | Hits:  81%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 25m | Avg:  4m 44s | Max: 13m 05s | Hits:  81%/522   
      🟩 Test               Pass: 100%/2   | Total: 29m 50s | Avg: 14m 55s | Max: 14m 56s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 50s | Avg:  2m 50s | Max:  2m 50s
      🟩 90a                Pass: 100%/1   | Total:  2m 42s | Avg:  2m 42s | Max:  2m 42s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 41s | Avg:  4m 10s | Max:  8m 37s
      🟩 20                 Pass: 100%/16  | Total:  1h 38m | Avg:  6m 09s | Max: 14m 56s | Hits:  81%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 33s | Avg: 5m 16s | Max: 8m 24s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  8m 24s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  8m 24s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  8m 24s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  8m 24s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  8m 24s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  8m 24s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 33s | Avg:  5m 16s | Max:  8m 24s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 09s | Avg:  2m 09s | Max:  2m 09s
      🟩 Test               Pass: 100%/1   | Total:  8m 24s | Avg:  8m 24s | Max:  8m 24s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 52s | Avg: 25m 52s | Max: 25m 52s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 144)

# Runner
98 linux-amd64-cpu16
19 linux-amd64-gpu-v100-latest-1
16 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@fbusato fbusato requested a review from miscco January 21, 2025 20:40
@fbusato fbusato enabled auto-merge (squash) January 21, 2025 21:08
Copy link
Contributor

🟩 CI finished in 1h 42m: Pass: 100%/135 | Total: 1d 01h | Avg: 11m 11s | Max: 1h 03m | Hits: 527%/23404
  • 🟩 cub: Pass: 100%/38 | Total: 8h 31m | Avg: 13m 27s | Max: 45m 38s | Hits: 539%/3540

    🟩 cpu
      🟩 amd64              Pass: 100%/36  | Total:  8h 22m | Avg: 13m 56s | Max: 45m 38s | Hits: 539%/3540  
      🟩 arm64              Pass: 100%/2   | Total:  9m 34s | Avg:  4m 47s | Max:  4m 56s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 49m 00s | Avg:  9m 48s | Max: 27m 17s | Hits: 539%/885   
      🟩 12.5               Pass: 100%/2   | Total:  1h 18m | Avg: 39m 24s | Max: 39m 48s
      🟩 12.6               Pass: 100%/31  | Total:  6h 23m | Avg: 12m 22s | Max: 45m 38s | Hits: 539%/2655  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 43s | Avg:  4m 21s | Max:  4m 30s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 49m 00s | Avg:  9m 48s | Max: 27m 17s | Hits: 539%/885   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 18m | Avg: 39m 24s | Max: 39m 48s
      🟩 nvcc12.6           Pass: 100%/29  | Total:  6h 15m | Avg: 12m 56s | Max: 45m 38s | Hits: 539%/2655  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 43s | Avg:  4m 21s | Max:  4m 30s
      🟩 nvcc               Pass: 100%/36  | Total:  8h 22m | Avg: 13m 58s | Max: 45m 38s | Hits: 539%/3540  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 39s | Avg:  5m 24s | Max:  5m 43s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 32s | Avg:  5m 32s | Max:  5m 32s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 17s | Avg:  5m 17s | Max:  5m 17s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 07m | Avg:  9m 41s | Max: 22m 34s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  5m 32s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 33s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
      🟩 GCC12              Pass: 100%/3   | Total: 29m 50s | Avg:  9m 56s | Max: 19m 32s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 04m | Avg: 15m 36s | Max: 32m 49s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 58m 04s | Avg: 29m 02s | Max: 30m 47s | Hits: 539%/1770  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 16m | Avg: 38m 04s | Max: 45m 38s | Hits: 539%/1770  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 18m | Avg: 39m 24s | Max: 39m 48s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  1h 45m | Avg:  7m 33s | Max: 22m 34s
      🟩 GCC                Pass: 100%/18  | Total:  3h 12m | Avg: 10m 42s | Max: 32m 49s
      🟩 MSVC               Pass: 100%/4   | Total:  2h 14m | Avg: 33m 33s | Max: 45m 38s | Hits: 539%/3540  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 24s | Max: 39m 48s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 23m 59s | Avg: 11m 59s | Max: 19m 32s
      🟩 v100               Pass: 100%/36  | Total:  8h 07m | Avg: 13m 32s | Max: 45m 38s | Hits: 539%/3540  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  5h 44m | Avg: 11m 07s | Max: 45m 38s | Hits: 539%/3540  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 16s | Avg: 23m 16s | Max: 23m 16s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 03s | Avg: 15m 03s | Max: 15m 03s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 57s | Max: 32m 47s
      🟩 TestGPU            Pass: 100%/2   | Total: 53m 46s | Avg: 26m 53s | Max: 32m 49s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 23m 59s | Avg: 11m 59s | Max: 19m 32s
      🟩 90a                Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  3h 02m | Avg: 13m 03s | Max: 39m 48s | Hits: 539%/2655  
      🟩 20                 Pass: 100%/24  | Total:  5h 28m | Avg: 13m 42s | Max: 45m 38s | Hits: 539%/885   
    
  • 🟩 libcudacxx: Pass: 100%/37 | Total: 6h 28m | Avg: 10m 29s | Max: 28m 49s | Hits: 682%/10162

    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total:  6h 20m | Avg: 10m 53s | Max: 28m 49s | Hits: 682%/10162 
      🟩 arm64              Pass: 100%/2   | Total:  7m 09s | Avg:  3m 34s | Max:  3m 36s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 37m 24s | Avg:  7m 28s | Max: 23m 05s | Hits: 682%/2495  
      🟩 12.5               Pass: 100%/2   | Total: 36m 50s | Avg: 18m 25s | Max: 28m 49s
      🟩 12.6               Pass: 100%/30  | Total:  5h 13m | Avg: 10m 27s | Max: 27m 16s | Hits: 682%/7667  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 07m | Avg: 16m 58s | Max: 20m 37s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 37m 24s | Avg:  7m 28s | Max: 23m 05s | Hits: 682%/2495  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 36m 50s | Avg: 18m 25s | Max: 28m 49s
      🟩 nvcc12.6           Pass: 100%/26  | Total:  4h 05m | Avg:  9m 27s | Max: 27m 16s | Hits: 682%/7667  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 07m | Avg: 16m 58s | Max: 20m 37s
      🟩 nvcc               Pass: 100%/33  | Total:  5h 20m | Avg:  9m 42s | Max: 28m 49s | Hits: 682%/10162 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 15m 49s | Avg:  3m 57s | Max:  4m 23s
      🟩 Clang15            Pass: 100%/1   | Total:  8m 24s | Avg:  8m 24s | Max:  8m 24s
      🟩 Clang16            Pass: 100%/1   | Total:  8m 28s | Avg:  8m 28s | Max:  8m 28s
      🟩 Clang17            Pass: 100%/1   | Total:  7m 38s | Avg:  7m 38s | Max:  7m 38s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 42m | Avg: 12m 46s | Max: 21m 58s
      🟩 GCC7               Pass: 100%/2   | Total:  6m 44s | Avg:  3m 22s | Max:  3m 23s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  4m 02s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 GCC11              Pass: 100%/1   | Total:  4m 11s | Avg:  4m 11s | Max:  4m 11s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 07s | Avg:  4m 07s | Max:  4m 07s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 18m | Avg:  9m 49s | Max: 22m 22s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 47m 11s | Avg: 23m 35s | Max: 24m 06s | Hits: 682%/5000  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 52m 41s | Avg: 26m 20s | Max: 27m 16s | Hits: 682%/5162  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 36m 50s | Avg: 18m 25s | Max: 28m 49s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/15  | Total:  2h 22m | Avg:  9m 30s | Max: 21m 58s
      🟩 GCC                Pass: 100%/16  | Total:  1h 48m | Avg:  6m 48s | Max: 22m 22s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 39m | Avg: 24m 58s | Max: 27m 16s | Hits: 682%/10162 
      🟩 NVHPC              Pass: 100%/2   | Total: 36m 50s | Avg: 18m 25s | Max: 28m 49s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total:  6h 28m | Avg: 10m 29s | Max: 28m 49s | Hits: 682%/10162 
    🟩 jobs
      🟩 Build              Pass: 100%/32  | Total:  5h 03m | Avg:  9m 28s | Max: 28m 49s | Hits: 682%/10162 
      🟩 NVRTC              Pass: 100%/2   | Total: 44m 33s | Avg: 22m 16s | Max: 22m 22s
      🟩 Test               Pass: 100%/2   | Total: 38m 22s | Avg: 19m 11s | Max: 21m 58s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 52s | Avg:  1m 52s | Max:  1m 52s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 14m 26s | Avg: 14m 26s | Max: 14m 26s
      🟩 90a                Pass: 100%/2   | Total: 16m 25s | Avg:  8m 12s | Max: 12m 24s
    🟩 std
      🟩 17                 Pass: 100%/15  | Total:  2h 58m | Avg: 11m 53s | Max: 28m 49s | Hits: 682%/7505  
      🟩 20                 Pass: 100%/21  | Total:  3h 27m | Avg:  9m 54s | Max: 27m 16s | Hits: 681%/2657  
    
  • 🟩 thrust: Pass: 100%/37 | Total: 7h 04m | Avg: 11m 27s | Max: 33m 01s | Hits: 360%/9180

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 38m 11s | Avg: 19m 05s | Max: 31m 58s
    🟩 cpu
      🟩 amd64              Pass: 100%/35  | Total:  6h 54m | Avg: 11m 50s | Max: 33m 01s | Hits: 360%/9180  
      🟩 arm64              Pass: 100%/2   | Total:  9m 44s | Avg:  4m 52s | Max:  5m 04s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 48m 24s | Avg:  9m 40s | Max: 28m 07s | Hits: 365%/1836  
      🟩 12.5               Pass: 100%/2   | Total: 52m 35s | Avg: 26m 17s | Max: 28m 29s
      🟩 12.6               Pass: 100%/30  | Total:  5h 23m | Avg: 10m 46s | Max: 33m 01s | Hits: 358%/7344  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 31s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 48m 24s | Avg:  9m 40s | Max: 28m 07s | Hits: 365%/1836  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 52m 35s | Avg: 26m 17s | Max: 28m 29s
      🟩 nvcc12.6           Pass: 100%/28  | Total:  5h 12m | Avg: 11m 09s | Max: 33m 01s | Hits: 358%/7344  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 31s
      🟩 nvcc               Pass: 100%/35  | Total:  6h 53m | Avg: 11m 48s | Max: 33m 01s | Hits: 360%/9180  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 38s | Avg:  5m 09s | Max:  5m 18s
      🟩 Clang15            Pass: 100%/1   | Total:  6m 06s | Avg:  6m 06s | Max:  6m 06s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 Clang18            Pass: 100%/7   | Total: 59m 41s | Avg:  8m 31s | Max: 24m 23s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 21s | Avg:  5m 10s | Max:  5m 22s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 GCC9               Pass: 100%/2   | Total: 10m 51s | Avg:  5m 25s | Max:  5m 40s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 23s | Avg:  6m 23s | Max:  6m 23s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 20m | Avg: 10m 07s | Max: 31m 58s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 55m 29s | Avg: 27m 44s | Max: 28m 07s | Hits: 365%/3672  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 33m | Avg: 31m 07s | Max: 33m 01s | Hits: 356%/5508  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 52m 35s | Avg: 26m 17s | Max: 28m 29s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/14  | Total:  1h 37m | Avg:  6m 58s | Max: 24m 23s
      🟩 GCC                Pass: 100%/16  | Total:  2h 05m | Avg:  7m 48s | Max: 31m 58s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 28m | Avg: 29m 46s | Max: 33m 01s | Hits: 360%/9180  
      🟩 NVHPC              Pass: 100%/2   | Total: 52m 35s | Avg: 26m 17s | Max: 28m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/37  | Total:  7h 04m | Avg: 11m 27s | Max: 33m 01s | Hits: 360%/9180  
    🟩 jobs
      🟩 Build              Pass: 100%/31  | Total:  5h 05m | Avg:  9m 51s | Max: 30m 35s | Hits: 358%/7344  
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 04s | Avg: 16m 21s | Max: 33m 01s | Hits: 365%/1836  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 09m | Avg: 23m 10s | Max: 31m 58s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 16s | Avg:  4m 16s | Max:  4m 16s
    🟩 std
      🟩 17                 Pass: 100%/14  | Total:  2h 48m | Avg: 12m 00s | Max: 29m 46s | Hits: 361%/5508  
      🟩 20                 Pass: 100%/21  | Total:  3h 37m | Avg: 10m 22s | Max: 33m 01s | Hits: 357%/3672  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 52m | Avg: 5m 37s | Max: 18m 28s | Hits: 388%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 42m | Avg:  6m 23s | Max: 18m 28s | Hits: 388%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 22s | Avg:  2m 35s | Max:  2m 40s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s | Hits: 388%/261   
      🟩 12.5               Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 16s
      🟩 12.6               Pass: 100%/17  | Total:  1h 30m | Avg:  5m 18s | Max: 18m 28s | Hits: 388%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s | Hits: 388%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 16s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 30m | Avg:  5m 18s | Max: 18m 28s | Hits: 388%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 52m | Avg:  5m 37s | Max: 18m 28s | Hits: 388%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 32s | Avg:  3m 32s | Max:  3m 32s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 33s | Avg:  3m 33s | Max:  3m 33s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 06s | Avg:  3m 06s | Max:  3m 06s
      🟩 Clang18            Pass: 100%/4   | Total: 26m 50s | Avg:  6m 42s | Max: 18m 28s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
      🟩 GCC12              Pass: 100%/2   | Total: 21m 21s | Avg: 10m 40s | Max: 17m 57s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 33s | Avg:  2m 38s | Max:  2m 49s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s | Hits: 388%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 26s | Avg: 11m 26s | Max: 11m 26s | Hits: 388%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 16s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 40m 12s | Avg:  5m 01s | Max: 18m 28s
      🟩 GCC                Pass: 100%/8   | Total: 38m 39s | Avg:  4m 49s | Max: 17m 57s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 22s | Avg: 11m 41s | Max: 11m 56s | Hits: 388%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 16s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  1h 52m | Avg:  5m 37s | Max: 18m 28s | Hits: 388%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 16m | Avg:  4m 13s | Max: 11m 56s | Hits: 388%/522   
      🟩 Test               Pass: 100%/2   | Total: 36m 25s | Avg: 18m 12s | Max: 18m 28s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 49s | Avg:  2m 49s | Max:  2m 49s
      🟩 90a                Pass: 100%/1   | Total:  2m 38s | Avg:  2m 38s | Max:  2m 38s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 15s | Avg:  3m 18s | Max:  5m 16s
      🟩 20                 Pass: 100%/16  | Total:  1h 39m | Avg:  6m 12s | Max: 18m 28s | Hits: 388%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 39s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  7m 39s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  7m 39s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  7m 39s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  7m 39s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  7m 39s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  7m 39s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  7m 39s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
      🟩 Test               Pass: 100%/1   | Total:  7m 39s | Avg:  7m 39s | Max:  7m 39s
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 03m | Avg: 1h 03m | Max: 1h 03m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 135)

# Runner
92 linux-amd64-cpu16
17 linux-amd64-gpu-v100-latest-1
15 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@fbusato fbusato merged commit d2857b1 into NVIDIA:main Jan 21, 2025
149 of 152 checks passed
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 22, 2025
davebayer added a commit to davebayer/cccl that referenced this pull request Jan 22, 2025
update docs

update docs

add `memcmp`, `memmove` and `memchr` implementations

implement tests

Use cuda::std::min/max in Thrust (NVIDIA#3364)

Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (NVIDIA#3361)

* implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16`

Cleanup util_arch (NVIDIA#2773)

Deprecate thrust::null_type (NVIDIA#3367)

Deprecate cub::DeviceSpmv (NVIDIA#3320)

Fixes: NVIDIA#896

Improves `DeviceSegmentedSort` test run time for large number of items and segments (NVIDIA#3246)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* fixes spelling

* adds tests for large number of segments

* fixes narrowing conversion in tests

* addresses review comments

* fixes includes

Compile basic infra test with C++17 (NVIDIA#3377)

Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (NVIDIA#3308)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* addresses review comments

* introduces segment offset type

* adds tests for large number of segments

* adds support for large number of segments

* drops segment offset type

* fixes thrust namespace

* removes about-to-be-deprecated cub iterators

* no exec specifier on defaulted ctor

* fixes gcc7 linker error

* uses local_segment_index_t throughout

* determine offset type based on type returned by segment iterator begin/end iterators

* minor style improvements

Exit with error when RAPIDS CI fails. (NVIDIA#3385)

cuda.parallel: Support structured types as algorithm inputs (NVIDIA#3218)

* Introduce gpu_struct decorator and typing

* Enable `reduce` to accept arrays of structs as inputs

* Add test for reducing arrays-of-struct

* Update documentation

* Use a numpy array rather than ctypes object

* Change zeros -> empty for output array and temp storage

* Add a TODO for typing GpuStruct

* Documentation udpates

* Remove test_reduce_struct_type from test_reduce.py

* Revert to `to_cccl_value()` accepting ndarray + GpuStruct

* Bump copyrights

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Deprecate thrust::async (NVIDIA#3324)

Fixes: NVIDIA#100

Review/Deprecate CUB `util.ptx` for CCCL 2.x (NVIDIA#3342)

Fix broken `_CCCL_BUILTIN_ASSUME` macro (NVIDIA#3314)

* add compiler-specific path
* fix device code path
* add _CCC_ASSUME

Deprecate thrust::numeric_limits (NVIDIA#3366)

Replace `typedef` with `using` in libcu++ (NVIDIA#3368)

Deprecate thrust::optional (NVIDIA#3307)

Fixes: NVIDIA#3306

Upgrade to Catch2 3.8  (NVIDIA#3310)

Fixes: NVIDIA#1724

refactor `<cuda/std/cstdint>` (NVIDIA#3325)

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Update CODEOWNERS (NVIDIA#3331)

* Update CODEOWNERS

* Update CODEOWNERS

* Update CODEOWNERS

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Fix sign-compare warning (NVIDIA#3408)

Implement more cmath functions to be usable on host and device (NVIDIA#3382)

* Implement more cmath functions to be usable on host and device

* Implement math roots functions

* Implement exponential functions

Redefine and deprecate thrust::remove_cvref (NVIDIA#3394)

* Redefine and deprecate thrust::remove_cvref

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Fix assert definition for NVHPC due to constexpr issues (NVIDIA#3418)

NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it.

Fix this by always using the host definition which should also work on device.

Fixes NVIDIA#3411

Extend CUB reduce benchmarks (NVIDIA#3401)

* Rename max.cu to custom.cu, since it uses a custom operator
* Extend types covered my min.cu to all fundamental types
* Add some notes on how to collect tuning parameters

Fixes: NVIDIA#3283

Update upload-pages-artifact to v3 (NVIDIA#3423)

* Update upload-pages-artifact to v3

* Empty commit

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Replace and deprecate thrust::cuda_cub::terminate (NVIDIA#3421)

`std::linalg` accessors and `transposed_layout` (NVIDIA#2962)

Add round up/down to multiple (NVIDIA#3234)

[FEA]: Introduce Python module with CCCL headers (NVIDIA#3201)

* Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative

* Run `copy_cccl_headers_to_aude_include()` before `setup()`

* Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path.

* Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel

* Bug fix: cuda/_include only exists after shutil.copytree() ran.

* Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py

* Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions)

* Replace := operator (needs Python 3.8+)

* Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md

* Restore original README.md: `pip3 install -e` now works on first pass.

* cuda_cccl/README.md: FOR INTERNAL USE ONLY

* Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under NVIDIA#3201 (comment))

Command used: ci/update_version.sh 2 8 0

* Modernize pyproject.toml, setup.py

Trigger for this change:

* NVIDIA#3201 (comment)

* NVIDIA#3201 (comment)

* Install CCCL headers under cuda.cccl.include

Trigger for this change:

* NVIDIA#3201 (comment)

Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely.

* Factor out cuda_cccl/cuda/cccl/include_paths.py

* Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative

* Add missing Copyright notice.

* Add missing __init__.py (cuda.cccl)

* Add `"cuda.cccl"` to `autodoc.mock_imports`

* Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.)

* Add # TODO: move this to a module-level import

* Modernize cuda_cooperative/pyproject.toml, setup.py

* Convert cuda_cooperative to use hatchling as build backend.

* Revert "Convert cuda_cooperative to use hatchling as build backend."

This reverts commit 61637d6.

* Move numpy from [build-system] requires -> [project] dependencies

* Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH

* Remove copy_license() and use license_files=["../../LICENSE"] instead.

* Further modernize cuda_cccl/setup.py to use pathlib

* Trivial simplifications in cuda_cccl/pyproject.toml

* Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code

* Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml

* Add taplo-pre-commit to .pre-commit-config.yaml

* taplo-pre-commit auto-fixes

* Use pathlib in cuda_cooperative/setup.py

* CCCL_PYTHON_PATH in cuda_cooperative/setup.py

* Modernize cuda_parallel/pyproject.toml, setup.py

* Use pathlib in cuda_parallel/setup.py

* Add `# TOML lint & format` comment.

* Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml

* Use pathlib in cuda/cccl/include_paths.py

* pre-commit autoupdate (EXCEPT clang-format, which was manually restored)

* Fixes after git merge main

* Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result'

```
=========================================================================== warnings summary ===========================================================================
tests/test_reduce.py::test_reduce_non_contiguous
  /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080>

  Traceback (most recent call last):
    File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__
      bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result))
                                                       ^^^^^^^^^^^^^^^^^
  AttributeError: '_Reduce' object has no attribute 'build_result'

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ==============================================================
```

* Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy`

* Introduce cuda_cooperative/constraints.txt

* Also add cuda_parallel/constraints.txt

* Add `--constraint constraints.txt` in ci/test_python.sh

* Update Copyright dates

* Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024)

For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI.

* Remove unused cuda_parallel jinja2 dependency (noticed by chance).

* Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead.

* Make cuda_cooperative, cuda_parallel testing completely independent.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Fix sign-compare warning (NVIDIA#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]"

This reverts commit ea33a21.

Error message: NVIDIA#3201 (comment)

* Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Restore original ci/matrix.yaml [skip-rapids]

* Use for loop in test_python.sh to avoid code duplication.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]

* Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]"

This reverts commit ec206fd.

* Implement suggestion by @shwina (NVIDIA#3201 (review))

* Address feedback by @leofang

---------

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

cuda.parallel: Add optional stream argument to reduce_into() (NVIDIA#3348)

* Add optional stream argument to reduce_into()

* Add tests to check for reduce_into() stream behavior

* Move protocol related utils to separate file and rework __cuda_stream__ error messages

* Fix synchronization issue in stream test and add one more invalid stream test case

* Rename cuda stream validation function after removing leading underscore

* Unpack values from __cuda_stream__ instead of indexing

* Fix linting errors

* Handle TypeError when unpacking invalid __cuda_stream__ return

* Use stream to allocate cupy memory in new stream test

Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (NVIDIA#3434)

Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (NVIDIA#3419)

* Deprecate `cub::{min, max}` and replace internal uses with those from libcu++

Fixes NVIDIA#3404

Fix CI issues (NVIDIA#3443)

Remove deprecated `cub::min` (NVIDIA#3450)

* Remove deprecated `cuda::{min,max}`

* Drop unused `thrust::remove_cvref` file

Fix typo in builtin (NVIDIA#3451)

Moves agents to `detail::<algorithm_name>` namespace (NVIDIA#3435)

uses unsigned offset types in thrust's scan dispatch (NVIDIA#3436)

Default transform_iterator's copy ctor (NVIDIA#3395)

Fixes: NVIDIA#2393

Turn C++ dialect warning into error (NVIDIA#3453)

Uses unsigned offset types in thrust's sort algorithm calling into `DispatchMergeSort` (NVIDIA#3437)

* uses thrust's dynamic dispatch for merge_sort

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Refactor allocator handling of contiguous_storage (NVIDIA#3050)

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Drop thrust::detail::integer_traits (NVIDIA#3391)

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Improve docs of std headers (NVIDIA#3416)

Drop C++11 and C++14 support for all of cccl (NVIDIA#3417)

* Drop C++11 and C++14 support for all of cccl

---------

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Deprecate a few CUB macros (NVIDIA#3456)

Deprecate thrust universal iterator categories (NVIDIA#3461)

Fix launch args order (NVIDIA#3465)

Add `--extended-lambda` to the list of removed clangd flags (NVIDIA#3432)

add `_CCCL_HAS_NVFP8` macro (NVIDIA#3429)

Add `_CCCL_BUILTIN_PREFETCH` (NVIDIA#3433)

Drop universal iterator categories (NVIDIA#3474)

Ensure that headers in `<cuda/*>` can be build with a C++ only compiler (NVIDIA#3472)

Specialize __is_extended_floating_point for FP8 types (NVIDIA#3470)

Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Moves CUB kernel entry points to a detail namespace (NVIDIA#3468)

* moves emptykernel to detail ns

* second batch

* third batch

* fourth batch

* fixes cuda parallel

* concatenates nested namespaces

Deprecate block/warp algo specializations (NVIDIA#3455)

Fixes: NVIDIA#3409

Refactor CUB's util_debug (NVIDIA#3345)
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 29, 2025
@fbusato fbusato deleted the builtin-prefetch branch February 11, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 Targeted for 3.0 release
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants