-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for constructing tensor/ktensor from data without copying #145
Comments
I like the idea of having all |
I can start working on a PR for that. Scanning through the code, it looks like the arguments to |
Here are the constructors with suggested behavior:
|
Because Suggestion: remove @ntjohnson1 @etphipp what do you think about this suggestion? This would break from TTB for MATLAB, but it would also be more aligned with other Python array/data packages, like |
I'm not a python expert by any stretch of the imagination, but that seems reasonable to me. I think people are used to the fact that Python and Matlab just have different semantics when it comes to copies, and so I think this would align better with what people expect (and be more efficient). |
Here are example use cases that could drive this change (and corresponding methods in other classes):
I think this would clean up the inconsistencies across the different classes and class constructors. |
I am 90% aligned with Danny's proposal in the final message. But this #145 (comment) seems to suggest a broader change to avoid the copy on construction wherever possible. Here is numpy's current behavior around copies/references import numpy as np
a = list(range(3))
b = np.array(a) # Copy required no matter what since lists aren't contiguous in memory
a[0] = 5
assert b[0] != 5
c = np.asarray(b) # Explicit as to just be a reference if possible
b[0] = 7
assert c[0] == 7
d = np.array(c) # Default constructor always yields copy
c[0] = 9
assert d[0] != 9 I like the With the approach above we would have this (since we aren't doing the copy). import numpy as np
from pyttb import tensor
a = np.ones((3,3))
b = tensor.from_data(a)
a[0,0] = 5
assert b[0,0] == 5
So something like:
K2 = K
K3 = K.copy()
T = K.to_tensor() |
Since numpy is mainly focused on a single data class and it accepts lots of constructor inputs that are array-like, this may not be a 1:1 alignment. I do not like the idea of numpy.asarray having different behavior based on whether the input is a numpy.ndarray or something else. Because it can make a copy or not based on the input, not based on which method is called from the user. I think we are more aligned with scipy.sparse data classes in pyttb, but I am not sure I like the constructor patterns there either; various types of input are accepted in the first argument, and when more than a single input is required you need to wrap it up into a tuple. I would like to move towards a more simplified interface that supports explicit copy requests through the constructor or via the copy class method. Also, our from_data methods could be replaced with (or aliased to) the default, non-empty constructors. Here are some examples:
K2 = K
K3 = K.copy()
T = K.to_tensor() This would mean that we could have the following constructors:
All of the classes with a |
If we are globally going to have copies opt-out then I think we drop the copy flag entirely and just make it very clear that for the default constructors we will be just using a reference to the data (the users can always make a copy before handing to us). That's fine with me, but we probably want to show an example for that somewhere early in the docs to avoid subtle bugs. We probably will want to update our tests to confirm we are in fact avoiding the copies. Right now I think our from_data methods also don't do any validation, if they are the default option we probably want to do validation. This example of references vs copies caught me the other day which is why I am more sensitive to it now I guess. >>> a = [[0]*2]*3
>>> a[0][0] = 1
>>> a
[[1, 0], [1, 0], [1, 0]] NIT: For ktensors if you are trying to drop from_factors I think flipping the order is nicer so when using factor matrices only the None can be left off ktensor(factor_matrices=None, weights=None) |
I agree that would make it cleaner by leaving out the copy flag and requiring users to explicitly call +1 for validation in constructors. Need to determine best way to support moving from The example above regarding the reference vs copy from @ntjohnson1 is definitely confusing. It is also opaque without digging into code, as here are the docs: >>> help(list.__mul__)
Help on wrapper_descriptor:
__mul__(self, value, /)
Return self*value. WIth better documentation, it may be clear that the nested lists are references to the same object. If we ever have a need for tensor comprehensions with short-cut nested reference semantics, we'll need to make sure we document the expected behavior. |
I would just go with Sounds like we are in agreement and have a plan on this then |
It seems that copying should be the default. For the vast majority of people, that will be simpler and prevent and problems. For those that want performance, they can set the copying to false. |
I am also fine with copy being opt-out. I just felt copy being opt-in seemed unintuitive. |
There does not seem to be a standard at this point across classes in related packages:
So, I suggest that we go for opt-out in pyttb classes: default is |
So is this back to what I had originally proposed in PR #138, e.g., adding |
Yes, it just took us a long time to determine that what you proposed is what we wanted to do. 👍 |
So how would you like to proceed with this? I have changes that add a |
I started working on this, so I think I’m good for now. I plan to have it done by early next week at the latest.
From: Eric Phipps ***@***.***>
Date: Wednesday, June 14, 2023 at 5:09 PM
To: sandialabs/pyttb ***@***.***>
Cc: Dunlavy, Daniel M ***@***.***>, Assign ***@***.***>
Subject: [EXTERNAL] Re: [sandialabs/pyttb] Add support for constructing tensor/ktensor from data without copying (Issue #145)
So how would you like to proceed with this? I have changes that add a copy=True argument to from_data() for ktensor and tensor. Would you like me to submit that as a PR as a starting point? We can add more changes to it.
—
Reply to this email directly, view it on GitHub<#145 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHIY2ITQIHEIOUECREZNGELXLJABHANCNFSM6AAAAAAZBBHNKA>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
Sounds good, thanks! |
This is an issue to discuss adding support for creating
tensor
andktensor
objects (and probably others) without copying the original data the object is derived from. This will improve efficiency, especially for third-party codes that are trying to integrate with pyttb (such as GenTen). I have a PR that could be submitted that implements one proposed solution by adding acopy
parameter to thefrom_data
methods oftensor
andktensor
(defaulting toTrue
to preserve existing behavior). Note also that the various functions for creating various types of tensors from data are not entirely consistent on whether the data is copied or not.Another reasonable solution in mind, and one that I would argue is preferable, is to never explicitly copy the data in these functions, and if the user wants a copy, to call the
copy
method of the class instead.The text was updated successfully, but these errors were encountered: