-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: multidimensional slices #6282
Comments
Some more discussion on gonum-dev: https://groups.google.com/d/topic/gonum-dev/WnptzWjqhmk/discussion |
Comment 2 by [email protected]: There's been some discussion about this. Main points so far: 1. The general goal is to make it easy to do Matlab/Octave like work in Go. Most engineering math today is prototyped in one of those languages, then translated to something faster for production use. 2. It should be possible to easily convert standard libraries from "Numerical Methods" (the book series, "Numerical Methods in Fortran, ... in C, in C++ etc.") to Go. That would get Go a needed set of standard numerical libraries. And it should be reasonably easy to translate Matlab code to Go code. 3. It's helpful to have only one standard way to represent matrices. Otherwise, you get math libraries where a matrix coming out of one library won't go into another. (This is one of Matlab's selling points.) 4. Performance is a major issue. These constructs should go fast, approaching or exceeding C/FORTRAN performance. Language issues: 1. In general, multidimensional array/slice access syntax would look like "arr[i,j]". 2. Multidimensional slices would be supported, derived from multidimensional arrays. 3. While there is interest in generics and overloading, that's probably too radical. However, extending "+", "-", and "*" to vectors and matrices might be reasonable. Many machines have hardware that can speed up such operations (MMX, etc.) so that's something a compiler can do that a library alone cannot. 4. There's interest in "reshaping", where a subarray or sub-slice is derived from an array. This can easily get overcomplicated (it did in FORTRAN 77) and needs to be carefully designed. Further discussion and use cases are needed for this. The machine learning community within Google might be consulted on what they want in this area. The machine learning theorists write in Matlab, and then someone has to make that work in production code. It would be helpful to get agreement on needed functionality before putting too much effort into syntax, so there are few specific syntax suggestions here yet. |
My desires are modest. I'd like to be able to write something like // LUPDecompose performs an in-place LU factorization of a square // matrix A. It returns a permutation vector P, such that PA = LU. // On completion, A[i][j] = L[i][j] if i > j, U[i][j] if i <= j. func LUPDecompose(A [][]float64) (P []float64, err error) When writing this kind of code in C, I used to define a macro #define A_(i,j) (A[(i)*rowsep + (j)]) That is precisely the kind of thing I wish the compiler would do for me. |
This document contains a collection of different proposals described in different levels of detail: https://docs.google.com/document/d/1gejfouITT25k29eHYTgdvi4cCLVt5dhYiJVvF46gp78/edit It also contains links to any proposal or information relevant to the topic that we found interesting. Feel free to edit the Document to clarify any questions relevant to implementation. I think most of the people in need of multi dimensional arrays in Go are not used to writing proposals for programming language changes. If you have any input that would make the process less awkward it would be highly appreciated. |
Comment 7 by [email protected]: There's been considerable discussion of this subject on "comp.lang.go.general". Areas of agreement: There seems to be a consensus that multidimensional arrays and slices in Go would be useful for numerical work. The array element access syntax generally proposed is arr[n,m] // 2D arr[n,m,o] // 3D etc. Existing arrays of arrays would be retained in Go. Arrays of arrays can be "ragged", (rows of different length) but multidimensional arrays are always of uniform length in each dimension, allowing FORTRAN-type optimizations. Both fixed-size multidimensional array types defined at compile time and arrays sized at run time have use cases. The former are widely used for graphics, and the latter are widely used for general numerical work. It should be possible to pass both as parameters to functions, although not necessarily the same functions. Areas of disagreement: There's much disagreement over what facilities for reslicing and reshaping should be provided. The minimum capability discussed is the ability to pass an entire array or multidimensional slice object to a function. Creating a slice which refers to a portion of an array, though, is controversial. Syntactically, this refers to expressions of the form arr[a0:a1, b0:b1] The problem is that values of b0 and b1 which are smaller than the array bounds result in the need to represent a kind of sparse array, where the memory distance between adjacent elements is not the same as the size of an element. This complicates the internal representation of slices; even the lowest dimension must have a "stride" value. There's a performance penalty for this feature when it not being used. But it is convenient to have. The other big issue is the ability to "grow" multidimensional slices. Should "append" be allowed for multidimensional arrays, and if so, how general should it be? Is growth allowed in all dimensions? It's a useful capability but complicates the implementation considerably. Those are the two biggest issues being discussed. Lesser issues include whether operators "+", "-", and "*" should be provided as built-in operations for multidimensional arrays and slices. Providing them permits easy transliteration for math libraries in other languages (C++, Matlab especially). But they're not essential; NumPy uses "A.dot(B)" for matrix multiply. There are a lot of cases to handle for "*" - vector*scalar, vector*matrix, matrix*matrix... "+" and "-" are somewhat simpler. That's roughly where the discussions are. Once the slice and growth issues are decided, the rest should fall into place. |
The discussion referred to by nagle@ is on golang-nuts ("comp.lang.go.general" is just a name made up by gmane.org): https://groups.google.com/forum/#!topic/golang-nuts/Q7lwBDPmQh4 |
My proposal, at https://docs.google.com/document/d/1xHyOK3hSxwMtyMH-pbXSXxo7Td9A_Teoj6cbr9ECeLM/edit?pli=1# has now reached a reasonably mature state. I'd appreciate if people had a look at it. It's a rather long proposal, but that's because this is not a small addition to the language and I've tried to be thorough. The proposal is purely about the basics of multidimensional slices; math operations on whole slices are not part of it. (It does not preclude later adding them, but does not set up a situation where they'd be necessary, either.) As regards comments that people have made so far on the document, please keep in mind that most of them were made against earlier versions, in which some things were not explained as well. Thus if any appear to be foolish that should not be held against the commenter, or at least not too much. (Though some comments have been marked as resolved, I haven't marked them that way just because I disagree and have improved my explanation, as that seems too dictatorial; I figure people can retire their own comments if their objections have truly been addressed.) As a personal comment, it is rather rare to have an opportunity to add a major feature to a language so cleanly. Usually one either has to break compatibility or introduce some serious ugliness or severe compromise. But here it's almost like Go was designed for this feature to just be plugged in and almost seamlessly fill a gap in its capabilities. Or so I believe; go ahead, hammer on the proposal and prove me wrong. |
Comment 10 by A.Vansteenkiste: In my opinion, instead of changing the language, it might be useful if the standard library simply provided a standard matrix format. That should guide the creation of idiomatic, compatible matrix libraries. Cfr. database/sql, which guided the creation of excellent database drivers. The image package already provides nice guidance on how to handle 2D matrices. It uses contiguous underlying storage which is needed when passing to external libraries. I modelled a matrix (storage) package after image, this is how it looks like: http://godoc.org/github.com/barnex/mat. However, many people wrote such kind of packages with slightly different, incompatible, conventions. Hence it might be nice to have on a standard convention in the standard library, once and for all. |
About a month ago, I put together a proposal for this. Putting this comment as a reference. Here is a reduced proposal for multi-dimensional arrays https://docs.google.com/document/d/1eHm7KqfKP9_s4vR1zToxq-FBazdUQ9ZYi-YhcEtdfR0/edit Golang-nuts discussion thread https://groups.google.com/forum/#!searchin/golang-nuts/tables$20in$20go/golang-nuts/osTLUEmB5Gk/3-f9_UKfE9MJ In the thread, Dan Kortschak proposed a nice syntax for range which could be an addition to the proposal (https://groups.google.com/d/msg/golang-nuts/osTLUEmB5Gk)/-A15bJpuTzsJ |
Related discussion in my go-nuts post: https://groups.google.com/forum/#!topic/golang-nuts/ScFRRxqHTkY |
@nigeltao's response to the proposal: https://groups.google.com/d/msg/golang-dev/T2oH4MK5kj8/Kwpe8NfD45IJ
|
Design document uploaded https://go-review.googlesource.com/16801 |
A somewhat different suggestion at #13253. |
@btracey no, see: https://docs.python.org/3/c-api/buffer.html wrt SIMD, even with the current proposal we might have some issues with alignment (especially when you sub-slice an n-dim slice (strided or not)). |
@btracey Answering your question, since Fortran 90 it is possible to take rows from a matrix with a similar syntax as the one used by Python, as for example in: Also, I do not think that supporting strided slices would suppose a problem for optimization, on the contrary. If they are not supported, some workaround will be needed to extract columns from a matrix, but this will disallow applying optimizations with SIMD instructions that support strided loads. |
@sbinet it has the information to pass to Lapack, but I'm pretty sure it has to copy the data before passing to Lapack if the inner stride is not 1. |
To add to the discussion, I have cleaned up a bit my old proposal for "shaped slices": It is not really a counter proposal (it is far less mature than the table proposal, and the syntax chosen or other details may not be the best options, or even possible). I just want to show how I think that everything essential to implement multidimensional slices in what I consider a useful form is a new type, a reshape operation and multi-dimensional slicing/indexing. Please, if you have any specific comment, add them directly in the document, to avoid hijacking this issue. |
I updated my proposal with the Current PR still at https://go-review.googlesource.com/#/c/25180/ |
with the new I am tempted to bring back my original suggestion (somewhere on gonum-dev): slice := make([*]int, 2, 3, 4) // a 2x3x4 n-dim slice
sub1 := slice[:,1,:] // a 2x4 n-dim slice
sub2 := reshape(slice, 6, 4)[:2,:3] // a 2x3 n-dim slice
for i, v := range slice {
for j, u := range v {
for k := range w {
fmt.Printf("slice[%d,%d,%d] = %v\n", slice[i,j,k])
}
}
}
at the type NdSliceHeader struct {
Data unsafe.Pointer
Len []int
Stride []int
} nd-slice literalsv := [*]int{1,2,3,4} // a 1x4 nd-slice
u := reshape([*]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
w := reshape([*]int{1,2,3,4}, 2, 1, 2} // a 2x1x2 nd-slice
// reshaping a slice is also allowed and creates an nd-slice
v := reshape([]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
// perhaps also this conversion could be allowed, like string/[]byte:
v := [*]int([]int{1,2,3,4}) // a 1x4 nd-slice copysrc := make([*]int, 2, 3)
dst := make([*]int, 4, 3)
// copy returns the number of elements copied in each dimension
n := copy(dst[:2,:], src[:1,1:]) // n == []int{1, 2} wrt to the non-strided proposal, you get the ability to slice in all dimensions. |
This is similar in spirit to @yiyus proposal. I don't think it actually gets you around unshape though, unless you want to forbid people from ever getting the underlying slice without using unsafe. One important usage is passing data to C, I.e. Lapack and other programs. Also note that with strided slices one needs to allocate and make a copy before lapqck (and others) since they require contiguous data. |
wrt using if/when a mechanism is devised to get (safely) the underlying array from a slice, I suppose it could be transposed to the nd-slice/slice pair. |
ah! one thing I forgot to mention in my nd-slice post: slicing a nd-slice, modifying its stride. v := reshape([*]int{1,2,3,4,5,6}, 2, 3) // a 2x3 nd-slice
// 1 2 3
// 4 5 6
u := v[:, 0:3:2] // a 2x2 nd-slice, taking one column every two
// 1 3
// 4 6
w := v[::, ::2] // == u |
looking at how tensorflow wraps around its C api is interesting and a valuable data point, /me thinks: |
This PR incorporates the larger structural changes suggested in the previous round of review. It does four things 1) Renames tables to just slices. 2) Recasts down-slicing as just a specific form of indexing. 3) Changes the behavior of range to be much simpler, and to follow from the idea of indexing. 4) Change the behavior of reshape to allow data structures with different numbers of elements 5) Add the unpack language built-in 6) Reworks the discussion of the performance of the single struct representation It also cleans up a number of minor language issues Updates golang/go#6282. Change-Id: Iabb905c089c6d41195ca3a3602157cd49af7acd1 Reviewed-on: https://go-review.googlesource.com/25180 Reviewed-by: Robert Griesemer <[email protected]>
To make progress with this proposal with the goal to come to a decision eventually, here's an abbreviated summary of the discussion so far. Primary goalsFirst of all, there are some overarching goals that this proposals attempts to achieve. To name a few (borrowing from #6282 (comment), and #6282 (comment)):
Virtually everybody also seems to be in agreement with respect to indexing notation and memory layout:
Proposed design@btracey, with input from the gonum team, has spent considerable effort coming up with a concrete design for multi-dimensional slices with the intent to address the goals of this proposal: https://github.com/golang/proposal/blob/master/design/6282-table-data.md . The above design addresses many of the desired goals of this proposal and, as a significant plus, the proposed multi-dimensional slices are in many ways a natural extension of the one-dim. slices we already have in Go. Problem areasThe single-biggest issue with the proposal is that one-dim. slices don’t have a stride exactly because multi-dim. slices gracefully degrade into Go’s existing slices. This problem has been pointed out as early as #6282 (comment), before a concrete design document was posted. The design document addresses this issue with various work-arounds. As a concrete example, given a two-dim. slice To support various "reshaping" operations for multi-dim. slices, the design proposes operations such as AlternativesSeveral alternative proposals have been floated as well:
SummaryThe proposed design of multi-dim. slices is a natural extension of Go's existing one-dim. slices. From a language point of view, the design appears backward-compatible with Go 1; and it does address many goals of the proposal. That said, the asymmetry of indexing operations requires non-obvious work-arounds when implementing numerical algorithms which runs counter one of the primary goals ( #6282 (comment) ) of this proposal. |
I agree with the summary above. Two quick comments (especially for those thinking of making a new proposal)
Thanks to @griesemer for the significant effort invested in this issue. |
Thank you @griesemer — a really good summary of the challenges and benefits. To elaborate on my 👍 to @btracey's comment above, I especially want to get behind index-operator methods: I suspect that moving index-operator syntax from rewriter-land to native Go would buy considerable power, and would allow libraries to handle more of the heavy-lifting when our implementation preferences diverge (e.g. axis-reordering operations can exist in a library, and needn't exist in native implementation). |
This proposal addresses a large portion of the originally stated goal: better Go support for numerical applications. However, as also has become quite clear, it falls short in others (asymmetry of proposed solution, potential proliferation of builtins). Judging from the feedback received on this issue, there is no clear consensus that the shortcomings can be safely ignored. Adding multi-dim. slices to the existing Go language would be a significant engineering effort. It seems unwise to make this effort with full knowledge of the proposal's inherent problems. Furthermore, exactly because the proposed solution ties so closely into the existing language, it would be nearly impossible to change or adjust the design down the road. Thus, after repeated and careful consideration, we are going to decline this proposal. That said, the discussions so far have been extremely helpful in delineating the problem domain, identifying key issues, and for identifying possible alternative approaches. We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2. Again, thanks to everybody, and particularly @btracey, for your contributions and time spent on this proposal. -gri, for @golang/proposal-review |
Time to revive this discussion? As a go and R fan, how does fortran approach the problems seen here? |
This issue is closed (and the proposal declined) for good reason. If you're looking for similar functionality, please see the gonum packages (gonum.org) |
-gri Maybe I should have mentioned that my comment was prompted by the recent announcement about working towards Go 2 (https://blog.golang.org/toward-go2). |
Consider filing an Experience Report with any specific issues you've run into with real code. We can't talk about proposals and solutions until we know what the actual underlying problems are. |
@griesemer wrote:
I only had 5 minutes to learn about this issue by perusing this thread, thus my very rushed thought is that perhaps the problem can be in theory solved with typeclasses. Essentially the 3. Struct type is the most correct solution to the issue presuming monomorphisation (inlining) and a smart enough optimizing compiler1. And btw, it’s the first solution that came to mind within 1 minute before I saw it proposed, so why did it take 4 years? So if we have typeclasses and operator overloading then the Afaics, with typeclasses the entire thing can be handled with libraries. And so we stop burdening the native with what can be in a library. Remember that typeclass bounds at the call site select the correct interface automatically based on the input data type. Go’s interfaces sort of do that also, but there’s some differences and limitations. Apologies if in my haste I missed some points that cause my post to be noise. Please courteously correct me if so. P.S. Is this implicating that Go has been without subslicing matrix capability for 5 years because of a lack of sufficient higher-level abstractions support in the language? Yet, I presume combining higher-level abstractions and maintaining low-level control performance is a difficult design challenge and maybe even insurmountable in general. 1 One of the reasons along with higher-level abstractions perhaps OCaml is favored by hedgefunds? |
It appears that this was not submitted as an Experience Report. I am wondering why not? |
Probably because this happened before that procedure. |
The text was updated successfully, but these errors were encountered: