-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should any_resource
be copyable?
#2379
Comments
I recall discussing this with @ericniebler and the TL;DR is that an I believe attempting to invoke the copy of |
not quite. we want our containers to own their memory resources just as STL containers own their allocators. also, we want our containers to be movable and copyable, and so the memory resource must also be movable and copyable. which brings us to the question of: what does it mean to copy a memory resource? it doesn't mean somehow trying to clone its state. from the type- and value-theoretic point of view, a memory resource doesn't have any observable state; to wit, the this is a round-about way of saying that we can have copyable memory resources by simply ref-counting them. we should provide such a ref-counting resource wrapper to make this easy. that is what you would pass to the container's constructor. but note, you don't always need to dynamically allocate and ref-count your memory resource. if you understand the lifetimes of the container and of the memory resource you'd like it to use, you can construct the container with a besides, our most commonly used memory resources are actually empty, trivially-copyable, "flyweight" objects that delegate to the CUDA runtime. all of this presupposes that the memory resource is thread-safe. if it's not, you'll have to wrap it in something that makes it thread-safe, or else be damn sure that it's not getting accessed concurrently. EDIT: this is making me think that we should require that |
@ericniebler I agree with everything you said, but want to clarify this point:
This shouldn't be the use case we optimize for. The resource we want most people to use is a memory pool, which means it has state and won't be copyable. |
Who is "we", and why do they want this? Maybe I don't understand what "we" mean by "own". In RMM, our containers do not own their MRs. They hold a reference to their MRs. Agree with @jrhemstad , in RAPIDS, our most commonly used memory resources are definitely not flyweight objects. |
if the containers store the memory resource by reference, that's the only option you have. if you store it by value, then you can pass a value-ish memory resource, or a reference-ish resource, or a ref-counted resource. it gives the user more control. rmm would pass a |
So what actually does happen when one creates an |
I would like to reiterate what the goals for the new cuda APIs are
Point 1. means for any container we need to bind the lifetime of the underlying memory resource to the container. Otherwise user will frequently fail to account for it and will run into segfaults. That is the reason we require That means that an Point 2. Means that we give the user the ability to make the safe default as performant as possible. You can have a look at my tests for Its not a especially large resource but copying it would be rather stupid. For that reason the resource is not passed by value, but through a This gives us the best of both worlds, without artificially constraining us into a corner. |
Perhaps I am not understanding something, but how does the requirement for copyable containers work with passing Suppose I make a
No suppose that in "do something", I make a call into a third party library passing Is the answer that any function that is going to copy the containers also has to take the memory resource, such that the lifetime requirements are explicit. Or is it that, in this scenario, my third-party library should somehow reject vectors that do not have a ref-counted memory resource inside them as their allocator? Or some third thing I haven't thought of? |
This is not safe to do and will result in a runtime error. What the user should have done is pass in the resource as is {
auto vec = cuda::std::vector<...>(some_internally_stateful_resource(), size);
... do something with vec;
// ~vec runs, ~resource runs, fine
} That way, the resource would at least still be alive. On the other hand if the user really wants to pass in a resource they need to do something more fancy, either move the construction out of the block, or and we should add that use a "shared" resource that is ref counted |
I have opened a PR for such a simple shared resource #2398 |
What kind of runtime error? use-after-free, or something concretely debuggable? |
I think often the user does not even know they wanted to do this. It requires knowing, potentially, details of everything third party API call they make, along with its transitive dependencies. I agree that the ref-counted resource solves this problem, thanks! But I am concerned that unless the constructor of allocated objects always reaches for this version, that the API will have many sharp edges. |
I agree with this comment about sharp edges. It will be easy to forget to create a shared resource, especially when taking some global resource and wrapping it with a resource adaptor and then passing that to functions. However, I think the common case is going to be something like we have in RMM:
Where (In fact, RAPIDS will want a wrapper for |
We are definitely tension field of security vs performance. I really do not want every resource to container a shared_ptr that does ref counting. At the same time nothing here is new. If a user wants to share a resource they need to make sure the resource is alive for as long as it is used. We give them one better tool to achieve that, but I do not believe there is a golden solution that will satisfy all needs |
I don't know what this phrase means. :) |
We need to balance security concerns with performance considerations. using refcounted resources everywhere is going to have a considerable performance impact, so I want to make sure that we only use them when needed |
I am not recommending refcounted resources everywhere. However I think shared ownership is needed whenever adaptors are used or in central resource management. Especially when we get into C++/Python interop. |
I believe we are all aligned on what we're trying to achieve and I believe the current design has everything needed to make everyone happy. TL;DR:
cudax::vector<T> buff{..., mr};
|
Good summary. Thank you! One more thing: we need a way to create a |
A question. Often our functions take
By using a So does the recommendation now become to accept |
This is a very good point, and I think you are correct. However, I don't think the pattern you described is representative of any libcudf examples, is it? I wouldn't expect we'd ever adapt an incoming memory resource on our own and return memory that was allocated with the adapted resource.
You may be pleasantly surprised. The size of the object is no different, it just requires initializing a few extra function pointers in the vtable. |
Plus atomically incrementing a reference counter if it's a shared resource.
No, but I feel like I've seen something like it in RAFT? I am probably wrong. Prefetching specifically is something we're doing in libcudf now, but I don't think we adapt the MR in the middle of a function like that. |
i think we have resolved this issue. i'm closing it "not planned". @harrism feel free to reopen if you still feel that changes are needed here. |
I don't think
any_resource
should support copy construction or copy assignment.What does it mean to copy a memory resource? A memory resource may own outstanding allocations to a LOT of memory. In some cases (pool memory resources), it may have allocated nearly all of the memory on a device, or in the system. What does it mean to copy that pool memory resource?
What if the copy is used to deallocate. Presumably the implementation of the pool doesn't know how to update the free lists of the original copy.
What is the intended use of copying
any_resource
?CC @wence-
The text was updated successfully, but these errors were encountered: