-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Refactor IntSets to use BitVectors #10065
Conversation
Very nice. The thorough performance comparison is pretty impressive! |
Remember to add |
5bc5969
to
782f34c
Compare
Interesting thought from #1032: If we can somehow manage to remove support for 0 in I'm not sure how practical that would be since IntSet doesn't implement indexing (and cannot do so efficiently). And it wouldn't satisfy the current meaning of |
Maybe we just create a new type |
|
304d113
to
5a53338
Compare
* Use BitVectors instead of a mismash of bitvector.c and ad-hoc Julia code. * Invert the complement IntSet semantics. Instead of filling ones outside the range of the IntSet, simply reverse what a set bit means. * For mathematical functions (intersect, union, etc.), use BitVector functions to work 64 bits at a time instead of working bit-by-bit. * Increase test-coverage to 100%
5a53338
to
a4f7a73
Compare
Bump! With inference decoupled from base, this became very easy to debug! All tests pass now. The only remaining issue is the performance gap on |
Bump... I spent time yesterday writing tests for |
Shucks. Sorry for the duplicated effort. Don't let this long-languishing work in progress hold up your testing work! Feel free to push your own tests and/or grab tests from here. If you do grab tests from here, you'll run into some complement IntSet bugs. I have a hunch that nobody is using complement IntSets very thoroughly. Maybe we should deprecate the complement functionality? This is a core piece of functionality that's required for inference (otherwise I don't think it'd be in base). It could make sense to simplify |
+1 for removing the complement feature. It was written back in the days when we were bored and just looking for neat stuff to do. |
Agree with removing the complement stuff. I also suspect this type should be renamed to |
My effort is at #12247. The complement functionality is neat, but I can see that it doesn't really belong in Base. |
I'd agree if this only supports 0 and above and is really only used for indices it should just be added as |
Perhaps we entirely deprecate |
SGTM 👍 |
I'm entirely convinced that we should remove the ability to store zero... |
Wonderful. Let's do it. It's actually not as bad as I had been thinking since we already branch for a bounds check. We can check for zero within that branch with no extra overhead to the other branch. |
else | ||
(s.bits[n>>5 + 1] & (UInt32(1)<<(n&31))) != 0 | ||
ifelse((idx <= 0) | (idx > typemax(Int)), false, s.inverse) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't you missing the return statements here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would probably be more clear, yes, but this works just fine as it is.
The result of the last expression in a function body is used as the return value (if not explicitly specified). In this case, the last expression is the if
block.
The result of an if
block is similarly determined by the last expression of the traversed branch (or nothing
if no branches were traversed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I knew that the return value of the last expression was used, but I though that an if
block would have returned nothing
.
Sorry for the noise.
Closing in favor of JuliaCollections/DataStructures.jl#114 (moving IntSet to DataStructures) and #12270 (deprecating complement and stored zeros, on the path to renaming IntSet → IndexSet). |
* Complete deprecation of stored zeros; IntSets now only support integers in the range `1:typemax(Int)` * Complete deprecation of `complement`; removes all support for inverted IntSets * Refactor internals to rely on a BitVector, allowing the use of highly optimized `map` methods. `IntSet` is now immutable. This significantly improves performance across varying [densities](http://imgur.com/a/uqv8A) and [sizes](http://imgur.com/a/iEgcr). These are compared against a modified Base with deprecation warnings removed for a fairer comparison. Testing code [available here](https://github.com/mbauman/IntSets.jl/tree/b50a7c97abbe9786e33221f723e107e266f31fe4/test). * Add more tests and organize into testsets. * Improve hashing; `hash(IntSet([1]))` is now distinct from `hash(IntSet([65]))` This is a continuation of #10065. Now that complements are fully removed, making IntSet immutable solves the performance issue. I am keeping the name the same within this PR as it vastly simplifies comparisons between the two implementations; the name can later be changed to `IndexSet` if still desired. The naming story is now a bit more complicated since we support offset indices, but a future change could perhaps allow wrapping any `AbstractVector{Bool}` and base the supported `Int`s on those indices. Very few methods depend upon BitArray internals.
This WIP is to merge my IntSets.jl project into base. Here's what this entails:
0:typemax(Int)-1
. Before it wasn't consistently defined, particularly when it came to complement IntSets.There are still two outstanding issues here that I could use some help with:The only thing left is a performance regression:Julia fails to launch with a segfault when the sysimg.{dll/dylib/so} is unavailable. It happens during the compilation of an inference function that makes heavy use of IntSets, but I can't tell why with my rudimentary LLDB/LLVM knowledge. My hunch is that I've missed some functions that are needed but load after inference.jl in the sysimg, but it successfully bootstraps… so I'm not sure.The new inference bootstrap separation made this a cinch to debug. The culprits were some redundantBase.
module-qualified names (since Base doesn't exist yet).in
. See these albums for a comparison against the current master with varying density and size. Some of this is due to the addition of an unnecessary GC frame (propagate effect_free information out of functions #9974), but I think most of it is from the extra dereference and perhaps some more-frequent cache misses. Particularly confusing to me isin
— there are fewer branches and the LLVM/native code looks much better (especially once I manually inline the getindex call to eliminate the GC frame), but it still lags in performance. That said, it doesn't seem to have an effect on overall compilation times from looking at the linalg test suite.