Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cudax::shared_resource #2398

Merged
merged 9 commits into from
Sep 19, 2024
Merged

Implement cudax::shared_resource #2398

merged 9 commits into from
Sep 19, 2024

Conversation

miscco
Copy link
Contributor

@miscco miscco commented Sep 10, 2024

We currently have two basic building blocks around memory resources, any_resource and resource_ref.

However, while they make owning and sharing resources much easier, we can still run into lifetime issues.

If a user wants to pass a resource into a library function that might exceed the lifetime of the resource, they would need to move it into an any_resource. any_resource requires movability and copyability, and not all resource types are movable and copyable.

However, they also might want to share that resource among multiple functions, e.g a pool allocator. We need a way to properly share a resource in those circumstances.

Enter shared_resource. A shared_resource<_Resource> behaves like a shared_ptr<_Resource> while also satisfying the resource concept. Users can construct their (potentially immovable) resource in a shared_resource instance, and then move the shared_resource into an any_resource. Now all copies of that any_resource instance with share ownership of the same underlying resource.

@miscco miscco requested review from a team as code owners September 10, 2024 12:24
@miscco miscco added feature request New feature or request. CUDA Next Feature intended for the Cuda Next experimental library labels Sep 10, 2024
@miscco miscco force-pushed the shared_resource branch 7 times, most recently from c624c70 to 1f76436 Compare September 10, 2024 15:13
We currently have two basic building blocks around memory resources, `any_resource` and `resource_ref`.

However, while they make owning and sharing resources much easier, we can still run into lifetime issues.

If a user wants to pass a resource into a library function that might exceed the lifetime of the resource, they would need to move it into an any_resource.

However, they also might want to share that resource among multiple functions, e.g a pool allocator. We need a way to properly share a resource in those circumstances.

Enter `shared_resource`. Rather than storing an `any_resource` this holds a `shared_ptr<any_resource>`.  With that we can happily copy / move them around and without touching the stored resource.
@ericniebler
Copy link
Contributor

Rather than storing an any_resource this holds a shared_ptr<any_resource>. With that we can happily copy / move them around and without touching the stored resource

i think this is the wrong design. any_resource requires copyability of any resource stored inside it. the way to achieve copyability is with a shared ptr. if you have a non-copyable resource R and you want to put it in an any_resource, it must be wrapped in a resource that stores shared_ptr<R>. R can't be any_resource because that's cyclic.

@miscco
Copy link
Contributor Author

miscco commented Sep 10, 2024

Rather than storing an any_resource this holds a shared_ptr<any_resource>. With that we can happily copy / move them around and without touching the stored resource

i think this is the wrong design. any_resource requires copyability of any resource stored inside it. the way to achieve copyability is with a shared ptr. if you have a non-copyable resource R and you want to put it in an any_resource, it must be wrapped in a resource that stores shared_ptr<R>. R can't be any_resource because that's cyclic.

This is not about copyability but the lifetime of the passed in resource. Currently we can either have reference semantics and have potential dangling references, or we can have ownership but no reference semantic.

This is the combination of both.

Copy link
Contributor

🟩 CI finished in 5h 12m: Pass: 100%/58 | Total: 2h 49m | Avg: 2m 55s | Max: 8m 00s | Hits: 85%/206
  • 🟩 cudax: Pass: 100%/58 | Total: 2h 49m | Avg: 2m 55s | Max: 8m 00s | Hits: 85%/206

    🟩 cpu
      🟩 amd64              Pass: 100%/54  | Total:  2h 40m | Avg:  2m 58s | Max:  8m 00s | Hits:  85%/206   
      🟩 arm64              Pass: 100%/4   | Total:  8m 53s | Avg:  2m 13s | Max:  2m 44s
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 09m | Avg:  3m 01s | Max:  8m 00s | Hits:  85%/103   
      🟩 12.6               Pass: 100%/35  | Total:  1h 40m | Avg:  2m 51s | Max:  6m 53s | Hits:  85%/103   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 09m | Avg:  3m 01s | Max:  8m 00s | Hits:  85%/103   
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1h 40m | Avg:  2m 51s | Max:  6m 53s | Hits:  85%/103   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/58  | Total:  2h 49m | Avg:  2m 55s | Max:  8m 00s | Hits:  85%/206   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  5m 08s | Avg:  2m 34s | Max:  2m 37s
      🟩 Clang10            Pass: 100%/2   | Total:  5m 06s | Avg:  2m 33s | Max:  2m 38s
      🟩 Clang11            Pass: 100%/4   | Total:  9m 56s | Avg:  2m 29s | Max:  2m 34s
      🟩 Clang12            Pass: 100%/4   | Total: 10m 50s | Avg:  2m 42s | Max:  2m 44s
      🟩 Clang13            Pass: 100%/4   | Total: 10m 02s | Avg:  2m 30s | Max:  2m 40s
      🟩 Clang14            Pass: 100%/6   | Total: 18m 42s | Avg:  3m 07s | Max:  5m 02s
      🟩 Clang15            Pass: 100%/2   | Total:  5m 15s | Avg:  2m 37s | Max:  2m 38s
      🟩 Clang16            Pass: 100%/4   | Total: 10m 25s | Avg:  2m 36s | Max:  2m 44s
      🟩 Clang17            Pass: 100%/2   | Total:  5m 46s | Avg:  2m 53s | Max:  3m 07s
      🟩 Clang18            Pass: 100%/4   | Total: 15m 14s | Avg:  3m 48s | Max:  5m 00s
      🟩 GCC9               Pass: 100%/2   | Total:  4m 38s | Avg:  2m 19s | Max:  2m 27s
      🟩 GCC10              Pass: 100%/4   | Total:  9m 23s | Avg:  2m 20s | Max:  2m 31s
      🟩 GCC11              Pass: 100%/4   | Total:  9m 21s | Avg:  2m 20s | Max:  2m 32s
      🟩 GCC12              Pass: 100%/9   | Total: 29m 17s | Avg:  3m 15s | Max:  4m 26s
      🟩 GCC13              Pass: 100%/3   | Total:  5m 43s | Avg:  1m 54s | Max:  2m 07s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 00s | Avg:  8m 00s | Max:  8m 00s | Hits:  85%/103   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  6m 53s | Avg:  6m 53s | Max:  6m 53s | Hits:  85%/103   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/34  | Total:  1h 36m | Avg:  2m 50s | Max:  5m 02s
      🟩 GCC                Pass: 100%/22  | Total: 58m 22s | Avg:  2m 39s | Max:  4m 26s
      🟩 MSVC               Pass: 100%/2   | Total: 14m 53s | Avg:  7m 26s | Max:  8m 00s | Hits:  85%/206   
    🟩 gpu
      🟩 v100               Pass: 100%/58  | Total:  2h 49m | Avg:  2m 55s | Max:  8m 00s | Hits:  85%/206   
    🟩 jobs
      🟩 Build              Pass: 100%/50  | Total:  2h 13m | Avg:  2m 40s | Max:  8m 00s | Hits:  85%/206   
      🟩 Test               Pass: 100%/8   | Total: 36m 05s | Avg:  4m 30s | Max:  5m 02s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s
      🟩 90a                Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s
    🟩 std
      🟩 17                 Pass: 100%/32  | Total:  1h 26m | Avg:  2m 41s | Max:  4m 59s
      🟩 20                 Pass: 100%/26  | Total:  1h 23m | Avg:  3m 12s | Max:  8m 00s | Hits:  85%/206   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CUDA C Core Library

🏃‍ Runner counts (total jobs: 58)

# Runner
44 linux-amd64-cpu16
8 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

@ericniebler ericniebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think #2410 is a better way forward

@ericniebler ericniebler dismissed their stale review September 18, 2024 20:49

i've integrated my own feedback

Copy link
Contributor

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, a couple of spelling errors (does CCCL have spell check for docs in CI?)

//! held by this \c shared_resource object is released, while the reference held both \c __other
//! is transfered to this object.
//! @param __other The \c shared_resource object to move from.
/// @post \c __other is left in a valid but unspecified state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "valid but unspecified state" mean? Is it valid to use other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the common phrase of move semantics.

It means dont touch it

//!
//! @endrst
template <class... _Properties>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to properties? They enable interfaces to be defined that expect to take shared ownership of an MR with specific properties. I think we need shared_resource to work like resource_ref with respect to properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we do not need that anymore, because we are using the explicit _Resource type to forward the properties

Copy link
Contributor

🟩 CI finished in 4h 52m: Pass: 100%/58 | Total: 2h 55m | Avg: 3m 01s | Max: 12m 29s | Hits: 83%/212
  • 🟩 cudax: Pass: 100%/58 | Total: 2h 55m | Avg: 3m 01s | Max: 12m 29s | Hits: 83%/212

    🟩 cpu
      🟩 amd64              Pass: 100%/54  | Total:  2h 45m | Avg:  3m 03s | Max: 12m 29s | Hits:  83%/212   
      🟩 arm64              Pass: 100%/4   | Total: 10m 05s | Avg:  2m 31s | Max:  2m 34s
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 09m | Avg:  3m 01s | Max: 10m 32s | Hits:  83%/106   
      🟩 12.6               Pass: 100%/35  | Total:  1h 45m | Avg:  3m 01s | Max: 12m 29s | Hits:  83%/106   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 09m | Avg:  3m 01s | Max: 10m 32s | Hits:  83%/106   
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1h 45m | Avg:  3m 01s | Max: 12m 29s | Hits:  83%/106   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/58  | Total:  2h 55m | Avg:  3m 01s | Max: 12m 29s | Hits:  83%/212   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 50s | Avg:  2m 25s | Max:  2m 25s
      🟩 Clang10            Pass: 100%/2   | Total:  5m 15s | Avg:  2m 37s | Max:  2m 52s
      🟩 Clang11            Pass: 100%/4   | Total: 10m 24s | Avg:  2m 36s | Max:  2m 43s
      🟩 Clang12            Pass: 100%/4   | Total:  9m 27s | Avg:  2m 21s | Max:  2m 26s
      🟩 Clang13            Pass: 100%/4   | Total:  8m 52s | Avg:  2m 13s | Max:  2m 25s
      🟩 Clang14            Pass: 100%/6   | Total: 18m 58s | Avg:  3m 09s | Max:  4m 13s
      🟩 Clang15            Pass: 100%/2   | Total:  5m 28s | Avg:  2m 44s | Max:  2m 56s
      🟩 Clang16            Pass: 100%/4   | Total: 10m 19s | Avg:  2m 34s | Max:  2m 43s
      🟩 Clang17            Pass: 100%/2   | Total:  5m 08s | Avg:  2m 34s | Max:  2m 36s
      🟩 Clang18            Pass: 100%/4   | Total: 14m 22s | Avg:  3m 35s | Max:  4m 42s
      🟩 GCC9               Pass: 100%/2   | Total:  4m 36s | Avg:  2m 18s | Max:  2m 18s
      🟩 GCC10              Pass: 100%/4   | Total:  9m 13s | Avg:  2m 18s | Max:  2m 40s
      🟩 GCC11              Pass: 100%/4   | Total:  9m 08s | Avg:  2m 17s | Max:  2m 43s
      🟩 GCC12              Pass: 100%/9   | Total: 29m 10s | Avg:  3m 14s | Max:  4m 29s
      🟩 GCC13              Pass: 100%/3   | Total:  7m 13s | Avg:  2m 24s | Max:  2m 34s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 32s | Avg: 10m 32s | Max: 10m 32s | Hits:  83%/106   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 29s | Avg: 12m 29s | Max: 12m 29s | Hits:  83%/106   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/34  | Total:  1h 33m | Avg:  2m 44s | Max:  4m 42s
      🟩 GCC                Pass: 100%/22  | Total: 59m 20s | Avg:  2m 41s | Max:  4m 29s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 01s | Avg: 11m 30s | Max: 12m 29s | Hits:  83%/212   
    🟩 gpu
      🟩 v100               Pass: 100%/58  | Total:  2h 55m | Avg:  3m 01s | Max: 12m 29s | Hits:  83%/212   
    🟩 jobs
      🟩 Build              Pass: 100%/50  | Total:  2h 21m | Avg:  2m 50s | Max: 12m 29s | Hits:  83%/212   
      🟩 Test               Pass: 100%/8   | Total: 33m 25s | Avg:  4m 10s | Max:  4m 42s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
      🟩 90a                Pass: 100%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s
    🟩 std
      🟩 17                 Pass: 100%/32  | Total:  1h 25m | Avg:  2m 40s | Max:  4m 42s
      🟩 20                 Pass: 100%/26  | Total:  1h 29m | Avg:  3m 27s | Max: 12m 29s | Hits:  83%/212   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CUDA C Core Library

🏃‍ Runner counts (total jobs: 58)

# Runner
44 linux-amd64-cpu16
8 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 1h 45m: Pass: 100%/58 | Total: 3h 04m | Avg: 3m 10s | Max: 11m 14s | Hits: 83%/212
  • 🟩 cudax: Pass: 100%/58 | Total: 3h 04m | Avg: 3m 10s | Max: 11m 14s | Hits: 83%/212

    🟩 cpu
      🟩 amd64              Pass: 100%/54  | Total:  2h 52m | Avg:  3m 11s | Max: 11m 14s | Hits:  83%/212   
      🟩 arm64              Pass: 100%/4   | Total: 11m 53s | Avg:  2m 58s | Max:  3m 20s
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 16m | Avg:  3m 19s | Max: 10m 20s | Hits:  83%/106   
      🟩 12.6               Pass: 100%/35  | Total:  1h 47m | Avg:  3m 04s | Max: 11m 14s | Hits:  83%/106   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 16m | Avg:  3m 19s | Max: 10m 20s | Hits:  83%/106   
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1h 47m | Avg:  3m 04s | Max: 11m 14s | Hits:  83%/106   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/58  | Total:  3h 04m | Avg:  3m 10s | Max: 11m 14s | Hits:  83%/212   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  5m 30s | Avg:  2m 45s | Max:  2m 48s
      🟩 Clang10            Pass: 100%/2   | Total:  5m 17s | Avg:  2m 38s | Max:  2m 48s
      🟩 Clang11            Pass: 100%/4   | Total:  9m 39s | Avg:  2m 24s | Max:  2m 27s
      🟩 Clang12            Pass: 100%/4   | Total:  9m 58s | Avg:  2m 29s | Max:  2m 45s
      🟩 Clang13            Pass: 100%/4   | Total: 10m 18s | Avg:  2m 34s | Max:  2m 53s
      🟩 Clang14            Pass: 100%/6   | Total: 22m 13s | Avg:  3m 42s | Max:  8m 08s
      🟩 Clang15            Pass: 100%/2   | Total:  4m 56s | Avg:  2m 28s | Max:  2m 28s
      🟩 Clang16            Pass: 100%/4   | Total: 12m 00s | Avg:  3m 00s | Max:  3m 20s
      🟩 Clang17            Pass: 100%/2   | Total:  5m 17s | Avg:  2m 38s | Max:  2m 43s
      🟩 Clang18            Pass: 100%/4   | Total: 14m 03s | Avg:  3m 30s | Max:  4m 34s
      🟩 GCC9               Pass: 100%/2   | Total:  5m 15s | Avg:  2m 37s | Max:  2m 56s
      🟩 GCC10              Pass: 100%/4   | Total: 10m 29s | Avg:  2m 37s | Max:  2m 46s
      🟩 GCC11              Pass: 100%/4   | Total:  9m 41s | Avg:  2m 25s | Max:  2m 40s
      🟩 GCC12              Pass: 100%/9   | Total: 30m 26s | Avg:  3m 22s | Max:  6m 04s
      🟩 GCC13              Pass: 100%/3   | Total:  7m 43s | Avg:  2m 34s | Max:  2m 37s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 20s | Avg: 10m 20s | Max: 10m 20s | Hits:  83%/106   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 14s | Avg: 11m 14s | Max: 11m 14s | Hits:  83%/106   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/34  | Total:  1h 39m | Avg:  2m 55s | Max:  8m 08s
      🟩 GCC                Pass: 100%/22  | Total:  1h 03m | Avg:  2m 53s | Max:  6m 04s
      🟩 MSVC               Pass: 100%/2   | Total: 21m 34s | Avg: 10m 47s | Max: 11m 14s | Hits:  83%/212   
    🟩 gpu
      🟩 v100               Pass: 100%/58  | Total:  3h 04m | Avg:  3m 10s | Max: 11m 14s | Hits:  83%/212   
    🟩 jobs
      🟩 Build              Pass: 100%/50  | Total:  2h 25m | Avg:  2m 54s | Max: 11m 14s | Hits:  83%/212   
      🟩 Test               Pass: 100%/8   | Total: 38m 49s | Avg:  4m 51s | Max:  8m 08s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s
      🟩 90a                Pass: 100%/1   | Total:  2m 29s | Avg:  2m 29s | Max:  2m 29s
    🟩 std
      🟩 17                 Pass: 100%/32  | Total:  1h 34m | Avg:  2m 58s | Max:  8m 08s
      🟩 20                 Pass: 100%/26  | Total:  1h 29m | Avg:  3m 26s | Max: 11m 14s | Hits:  83%/212   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda
CUDA C Core Library

🏃‍ Runner counts (total jobs: 58)

# Runner
44 linux-amd64-cpu16
8 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Comment on lines +264 to +269
template <class _Resource, class... _Args>
auto make_shared_resource(_Args&&... __args) -> shared_resource<_Resource>
{
static_assert(_CUDA_VMR::resource<_Resource>, "_Resource does not satisfy the cuda::mr::resource concept");
return shared_resource<_Resource>{_CUDA_VSTD::forward<_Args>(__args)...};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is make_shared_resource<_Resource>(args...) any different than shared_resource<_Resource>(args...) apart from being five characters longer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is just symmetric with all the other stuff

@ericniebler
Copy link
Contributor

@miscco are you ok with this pr as it is now?

@miscco miscco dismissed harrism’s stale review September 19, 2024 17:23

We are going forward with merging this now and fix stuff when mark is back from vacation

@miscco miscco merged commit 7bd04ad into NVIDIA:main Sep 19, 2024
76 checks passed
@miscco miscco deleted the shared_resource branch September 19, 2024 17:23
@harrism
Copy link
Contributor

harrism commented Oct 9, 2024

We are going forward with merging this now and fix stuff when mark is back from vacation

There wasn't anything left to address in my review anyway, was there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Next Feature intended for the Cuda Next experimental library feature request New feature or request.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants