Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate iterate(::AbstractArray) with inbounds #40397

Closed
wants to merge 1 commit into from

Conversation

tkf
Copy link
Member

@tkf tkf commented Apr 7, 2021

This PR adds inbounds and inline annotation to iterate on AbstractArray as already done for Array.

Before this PR, the effect of the missing inbounds annotation can be observed with this benchmark:

julia> function f(xs)
           a = 0.0
           for x in xs
               a += x
           end
           a
       end
f (generic function with 1 method)

julia> function f1!(a, xs)
           for x in xs
               @inbounds a[1] += x
           end
       end
f1! (generic function with 1 method)

julia> function f2!(a, xs)
           for i in eachindex(xs)
               @inbounds a[1] += xs[i]
           end
       end
f2! (generic function with 1 method)

julia> xs = randn(1000);

julia> @btime f($xs);
  879.981 ns (0 allocations: 0 bytes)

julia> @btime f($(view(xs, 1:length(xs))));
  887.688 ns (0 allocations: 0 bytes)

julia> @btime f1!($([0.0]), $xs)
  915.919 ns (0 allocations: 0 bytes)

julia> @btime f1!($([0.0]), $(view(xs, 1:length(xs))))  # the only slow case
  3.285 μs (0 allocations: 0 bytes)

julia> @btime f2!($([0.0]), $xs)
  916.216 ns (0 allocations: 0 bytes)

julia> @btime f2!($([0.0]), $(view(xs, 1:length(xs))))
  916.189 ns (0 allocations: 0 bytes)

(Aside: Interestingly, f($(view(xs, 1:length(xs)))) showed no performance difference, even though the bound check is not eliminated when looking at the LLVM IR.)

The slow case f1!($([0.0]), $(view(xs, 1:length(xs)))) is fixed this this PR.

Since

xs::AbstractArray
for I in eachindex(xs)
    checkbounds(xs, I)
end

must hold (not throw) for any abstract array xs, I think this use of @inbounds is correct. The use of @inbounds is justified by the information locally available inside the iterate method.

@thchr
Copy link
Contributor

thchr commented Apr 8, 2021

Does this also fix #39354?

@JeffBezanson
Copy link
Member

I believe we deliberately did not do this, to guard against potentially-buggy AbstractArray subtypes.

@tkf
Copy link
Member Author

tkf commented Apr 8, 2021

I feared it might have been the case... But I personally think protecting against programming error at low-level constructs like array implementation is too much of a wide contract. There are so many other things that result in undefined behavior in Julia. Also, I'd prefer more strict definitions in the treatment of programming error in the API, rather than each API trying to read the user's mind and resulting in slightly different boundaries.

If we want a better failure mode from miss-implementation, I think we need a trait system so that the implementer can express the confidence of their implementation.

That said, an alternative easier solution may be to use @propagate_inbounds, so that the users can use @inbounds for x in xs. But @inbounds affecting syntaxes without [...] is somewhat strange.

@mbauman mbauman added the arrays [a, r, r, a, y, s] label Apr 8, 2021
@mbauman
Copy link
Member

mbauman commented Apr 8, 2021

Yeah, there's some really long issues about this... searching for "inbounds next iteration" is rather hopeless, but I found this:

#27384 (comment)

Here's the crux of the problem:

The only thing [iterate] doesn't protect ourselves from is someone manually constructing and passing an invalid state that happens to pass the check but it still out of bounds.

More concretely: when inboundsing Array iteration in #27386, @haampie also changed the state done check to ensure invalid (non-positive value) states don't fall through to indexing.

The issue in my mind isn't so much trusting buggy array implementations — that ship has long sailed — but rather trusting implementors users of iteration not to pass invalid states (which is as easy as simultaneously iterating over two arrays you thought were the same length but mixing up their states).

@oscardssmith
Copy link
Member

What if we made a function not_broken that returns whether an AbstractArray is guaranteed to have correct implementations of size and get/set index? That way, writers of Array types can guarantee they're types, and any resulting segfault becomes a bug in their code rather than in Base?

@tkf
Copy link
Member Author

tkf commented Apr 8, 2021

which is as easy as simultaneously iterating over two arrays you thought were the same length but mixing up their states

This is not really possible in general, unless you know the type of the array and how it implements the iteration state 1, right? For example, it is a legal implementation for Array{T} to use Ptr{T} as the state. In this case, it's very hard for other arrays to use this state in a meaningful way (e.g., to implement a zip). For an arbitrary abstract array, the state can contain anything. I've been interpreting the API of iterate to be that last(iterate(iter, ...)) is the only state that can be passed to the second argument of iterate.

I believe it's needless to say, but allow me to reemphasize that designing an abstract interface and programming against it is a mathematical activity. Expressing the invariance and letting the compiler rely on it is crucial for witting generic high-performance code. I think we need a principled approach, at least for a very basic control flow like a structured loop and very basic containers like array types.

Footnotes

  1. AbstractString seems to be the only case in Base that iterate state is exposed as a public API

@mbauman
Copy link
Member

mbauman commented Apr 8, 2021

Right, I'd be 100% on board with this if it were only about folks ginning up phony states or bad array implementations. But it's not. It's quite likely that if you're manually iterating over two arrays at the same time, that they'd be the same type. For example:

julia> function f(x, y)
           i1 = iterate(x)
           i2 = iterate(y)
           r = 0
           while i1 !== nothing && i2 !== nothing
               v1, s1 = i1
               v2, s2 = i2
               r = v1 + v2
               i1 = iterate(y, s1) # oops
               i2 = iterate(x, s2)
           end
           r
       end
f (generic function with 2 methods)

julia> f((rand(2))', (rand(3))')
ERROR: BoundsError: attempt to access 2-element Vector{Float64} at index [3]

It seems not great that the above typo could be a segfault (ok, not likely to segfault on a 1-past access, but offsets could make this more explodey).

@mbauman
Copy link
Member

mbauman commented Apr 8, 2021

I suppose, though, that this isn't terribly unlike the mistake of changing the length of a vector while iterating it or changing indices of a view behind its back.

This is just why we've not done it yet.

@haampie
Copy link
Contributor

haampie commented Apr 8, 2021

My conclusion from 3 years ago was: there's always julia --check-bounds=no ^^

@tkf
Copy link
Member Author

tkf commented Apr 8, 2021

if you're manually iterating over two arrays at the same time, that they'd be the same type

Thanks, this makes sense. I wasn't thinking about this kind of programming error.

this isn't terribly unlike the mistake of changing the length of a vector while iterating it

I was writing exactly this comment before simplifying my previous comment :)

And yes, this is my point. If you don't satisfy the precondition before calling an API, you'll get undefined behavior. Checking every precondition for every non-unsafe_ API is not really possible for a high-performance language (or rather, we need a much more advanced type system for this, I guess).


Of course, if we want to preserve user-friendliness of iterate, we can introduce extra indirection for performance. I can think of a couple of ways to do this in a backward compatible manner:

(1) Add a function iterator(_) that returns a fast/unsafe iterator. Lower a for loop to:

it = iterator(xs)  # default to `iterator(xs) = xs`
y = iterate(it)
while y !== nothing
    x, s = y
    $loop_body(x)
    y = iterate(it, s)
end

As a bonus, wrapping iterable in a custom type is easier. I think I saw StefanKarpinski mentioning it somewhere.

(2) unsafe_iterate called from iterate. An AbstractType can opt-in this approach by defining

function iterate(iter::AbstractType, state)
    checkstate(iter, state)
    unsafe_iterate(iter, state)
end

unsafe_iterate(iter::AbstractType, state) = throw(MethodError(unsafe_iterate, state))

For for loop to use unsafe_iterate instead of iterate, we need to define unsafe_iterate(::Any, ...) in terms of iterate(::Any, ...). That's why an AbstractType must throw MethodError.

(3) use foldl :trollface:

@tkf
Copy link
Member Author

tkf commented Apr 9, 2021

if you're manually iterating over two arrays at the same time, that they'd be the same type

I just realized that it's possible to protect against this kind of mistake by putting the array itself in the state

function iterate(X::AbstractArray, state=(X, eachindex(X),))
    @_inline_meta
    A = first(state)  # state includes A
    y = iterate(tail(state)...)
    y === nothing && return nothing
    @inbounds A[y[1]], (A, state[2], tail(y)...)
end

@vtjnash
Copy link
Member

vtjnash commented Jul 19, 2021

This is too unsafe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] iteration Involves iteration or the iteration protocol performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants