Annotate iterate(::AbstractArray) with inbounds #40397

tkf · 2021-04-07T23:06:40Z

This PR adds inbounds and inline annotation to iterate on AbstractArray as already done for Array.

Before this PR, the effect of the missing inbounds annotation can be observed with this benchmark:

julia> function f(xs)
           a = 0.0
           for x in xs
               a += x
           end
           a
       end
f (generic function with 1 method)

julia> function f1!(a, xs)
           for x in xs
               @inbounds a[1] += x
           end
       end
f1! (generic function with 1 method)

julia> function f2!(a, xs)
           for i in eachindex(xs)
               @inbounds a[1] += xs[i]
           end
       end
f2! (generic function with 1 method)

julia> xs = randn(1000);

julia> @btime f($xs);
  879.981 ns (0 allocations: 0 bytes)

julia> @btime f($(view(xs, 1:length(xs))));
  887.688 ns (0 allocations: 0 bytes)

julia> @btime f1!($([0.0]), $xs)
  915.919 ns (0 allocations: 0 bytes)

julia> @btime f1!($([0.0]), $(view(xs, 1:length(xs))))  # the only slow case
  3.285 μs (0 allocations: 0 bytes)

julia> @btime f2!($([0.0]), $xs)
  916.216 ns (0 allocations: 0 bytes)

julia> @btime f2!($([0.0]), $(view(xs, 1:length(xs))))
  916.189 ns (0 allocations: 0 bytes)

(Aside: Interestingly, f($(view(xs, 1:length(xs)))) showed no performance difference, even though the bound check is not eliminated when looking at the LLVM IR.)

The slow case f1!($([0.0]), $(view(xs, 1:length(xs)))) is fixed this this PR.

Since

xs::AbstractArray
for I in eachindex(xs)
    checkbounds(xs, I)
end

must hold (not throw) for any abstract array xs, I think this use of @inbounds is correct. The use of @inbounds is justified by the information locally available inside the iterate method.

thchr · 2021-04-08T02:10:01Z

Does this also fix #39354?

JeffBezanson · 2021-04-08T03:04:15Z

I believe we deliberately did not do this, to guard against potentially-buggy AbstractArray subtypes.

tkf · 2021-04-08T03:29:14Z

I feared it might have been the case... But I personally think protecting against programming error at low-level constructs like array implementation is too much of a wide contract. There are so many other things that result in undefined behavior in Julia. Also, I'd prefer more strict definitions in the treatment of programming error in the API, rather than each API trying to read the user's mind and resulting in slightly different boundaries.

If we want a better failure mode from miss-implementation, I think we need a trait system so that the implementer can express the confidence of their implementation.

That said, an alternative easier solution may be to use @propagate_inbounds, so that the users can use @inbounds for x in xs. But @inbounds affecting syntaxes without [...] is somewhat strange.

mbauman · 2021-04-08T04:49:02Z

Yeah, there's some really long issues about this... searching for "inbounds next iteration" is rather hopeless, but I found this:

#27384 (comment)

Here's the crux of the problem:

The only thing [iterate] doesn't protect ourselves from is someone manually constructing and passing an invalid state that happens to pass the check but it still out of bounds.

More concretely: when inboundsing Array iteration in #27386, @haampie also changed the state done check to ensure invalid (non-positive value) states don't fall through to indexing.

The issue in my mind isn't so much trusting buggy array implementations — that ship has long sailed — but rather trusting ~~implementors~~ users of iteration not to pass invalid states (which is as easy as simultaneously iterating over two arrays you thought were the same length but mixing up their states).

oscardssmith · 2021-04-08T06:06:08Z

What if we made a function not_broken that returns whether an AbstractArray is guaranteed to have correct implementations of size and get/set index? That way, writers of Array types can guarantee they're types, and any resulting segfault becomes a bug in their code rather than in Base?

tkf · 2021-04-08T06:33:43Z

which is as easy as simultaneously iterating over two arrays you thought were the same length but mixing up their states

This is not really possible in general, unless you know the type of the array and how it implements the iteration state ¹, right? For example, it is a legal implementation for Array{T} to use Ptr{T} as the state. In this case, it's very hard for other arrays to use this state in a meaningful way (e.g., to implement a zip). For an arbitrary abstract array, the state can contain anything. I've been interpreting the API of iterate to be that last(iterate(iter, ...)) is the only state that can be passed to the second argument of iterate.

I believe it's needless to say, but allow me to reemphasize that designing an abstract interface and programming against it is a mathematical activity. Expressing the invariance and letting the compiler rely on it is crucial for witting generic high-performance code. I think we need a principled approach, at least for a very basic control flow like a structured loop and very basic containers like array types.

AbstractString seems to be the only case in Base that iterate state is exposed as a public API ↩

mbauman · 2021-04-08T06:51:00Z

Right, I'd be 100% on board with this if it were only about folks ginning up phony states or bad array implementations. But it's not. It's quite likely that if you're manually iterating over two arrays at the same time, that they'd be the same type. For example:

julia> function f(x, y)
           i1 = iterate(x)
           i2 = iterate(y)
           r = 0
           while i1 !== nothing && i2 !== nothing
               v1, s1 = i1
               v2, s2 = i2
               r = v1 + v2
               i1 = iterate(y, s1) # oops
               i2 = iterate(x, s2)
           end
           r
       end
f (generic function with 2 methods)

julia> f((rand(2))', (rand(3))')
ERROR: BoundsError: attempt to access 2-element Vector{Float64} at index [3]

It seems not great that the above typo could be a segfault (ok, not likely to segfault on a 1-past access, but offsets could make this more explodey).

mbauman · 2021-04-08T07:09:09Z

I suppose, though, that this isn't terribly unlike the mistake of changing the length of a vector while iterating it or changing indices of a view behind its back.

This is just why we've not done it yet.

haampie · 2021-04-08T07:40:41Z

My conclusion from 3 years ago was: there's always julia --check-bounds=no ^^

tkf · 2021-04-08T08:35:49Z

if you're manually iterating over two arrays at the same time, that they'd be the same type

Thanks, this makes sense. I wasn't thinking about this kind of programming error.

this isn't terribly unlike the mistake of changing the length of a vector while iterating it

I was writing exactly this comment before simplifying my previous comment :)

And yes, this is my point. If you don't satisfy the precondition before calling an API, you'll get undefined behavior. Checking every precondition for every non-unsafe_ API is not really possible for a high-performance language (or rather, we need a much more advanced type system for this, I guess).

Of course, if we want to preserve user-friendliness of iterate, we can introduce extra indirection for performance. I can think of a couple of ways to do this in a backward compatible manner:

(1) Add a function iterator(_) that returns a fast/unsafe iterator. Lower a for loop to:

it = iterator(xs)  # default to `iterator(xs) = xs`
y = iterate(it)
while y !== nothing
    x, s = y
    $loop_body(x)
    y = iterate(it, s)
end

As a bonus, wrapping iterable in a custom type is easier. I think I saw StefanKarpinski mentioning it somewhere.

(2) unsafe_iterate called from iterate. An AbstractType can opt-in this approach by defining

function iterate(iter::AbstractType, state)
    checkstate(iter, state)
    unsafe_iterate(iter, state)
end

unsafe_iterate(iter::AbstractType, state) = throw(MethodError(unsafe_iterate, state))

For for loop to use unsafe_iterate instead of iterate, we need to define unsafe_iterate(::Any, ...) in terms of iterate(::Any, ...). That's why an AbstractType must throw MethodError.

(3) use foldl

tkf · 2021-04-09T03:09:38Z

if you're manually iterating over two arrays at the same time, that they'd be the same type

I just realized that it's possible to protect against this kind of mistake by putting the array itself in the state

function iterate(X::AbstractArray, state=(X, eachindex(X),))
    @_inline_meta
    A = first(state)  # state includes A
    y = iterate(tail(state)...)
    y === nothing && return nothing
    @inbounds A[y[1]], (A, state[2], tail(y)...)
end

vtjnash · 2021-07-19T22:40:28Z

This is too unsafe

Annotate iterate(::AbstractArray) with inbounds

b58e1df

mbauman added the arrays [a, r, r, a, y, s] label Apr 8, 2021

vtjnash closed this Jul 19, 2021

timholy mentioned this pull request Aug 27, 2021

Add inbounds annotation to safe iteration #42030

Open

2 tasks

tkf mentioned this pull request Sep 28, 2021

Bidirectional and other iterator API design issues #42402

Open

KristofferC mentioned this pull request Oct 3, 2021

vcat(::AbstractRange...) can do an out of bounds access inside an @inbounds block #42483

Open

stev47 mentioned this pull request Dec 8, 2021

iterate on SubArray uses slow default fallback #43295

Open

matthias314 mentioned this pull request Feb 19, 2023

faster iteration over subarrays (views) #48720

Merged

jishnub mentioned this pull request Jan 28, 2025

add @inbounds to iterate method for AbstractArray: 3x speedup #57112

Closed

nsajko added performance Must go faster iteration Involves iteration or the iteration protocol labels Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate iterate(::AbstractArray) with inbounds #40397

Annotate iterate(::AbstractArray) with inbounds #40397

tkf commented Apr 7, 2021

thchr commented Apr 8, 2021

JeffBezanson commented Apr 8, 2021

tkf commented Apr 8, 2021

mbauman commented Apr 8, 2021 •

edited

Loading

oscardssmith commented Apr 8, 2021

tkf commented Apr 8, 2021

mbauman commented Apr 8, 2021 •

edited

Loading

mbauman commented Apr 8, 2021

haampie commented Apr 8, 2021

tkf commented Apr 8, 2021 •

edited

Loading

tkf commented Apr 9, 2021

vtjnash commented Jul 19, 2021

Annotate iterate(::AbstractArray) with inbounds #40397

Annotate iterate(::AbstractArray) with inbounds #40397

Conversation

tkf commented Apr 7, 2021

thchr commented Apr 8, 2021

JeffBezanson commented Apr 8, 2021

tkf commented Apr 8, 2021

mbauman commented Apr 8, 2021 • edited Loading

oscardssmith commented Apr 8, 2021

tkf commented Apr 8, 2021

Footnotes

mbauman commented Apr 8, 2021 • edited Loading

mbauman commented Apr 8, 2021

haampie commented Apr 8, 2021

tkf commented Apr 8, 2021 • edited Loading

tkf commented Apr 9, 2021

vtjnash commented Jul 19, 2021

mbauman commented Apr 8, 2021 •

edited

Loading

mbauman commented Apr 8, 2021 •

edited

Loading

tkf commented Apr 8, 2021 •

edited

Loading