Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen gcroot optimization #14543

Merged
merged 20 commits into from
Feb 12, 2016
Merged

codegen gcroot optimization #14543

merged 20 commits into from
Feb 12, 2016

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 3, 2016

this rewrites the codegen gcroot allocation algorithm to exist entirely as a llvm pass that can be run as a separate optimization pass (instead of updating argDepth along the way). it computes liveness intervals to perform register coloring at the basic-block level (fine-grained tracking for the instruction level has not been implemented yet).

at this stage, i believe this should yield results comparable to the current algorithm. however, the expected benefits are:

  • the algorithm can handle an SSA-form Julia AST without degradation
  • flexibility for writing new / more advanced algorithms is a possibility
  • may make it easier to try new gc strategies (e.g. stack maps) and insert the correct gc-state transitions
  • simpler codegen (gets us closer to merging emit_expr + emit_unboxed + boxed + get_gcrooted)

TODO:

  • fix LLVM 3.7 compatibility
  • cleanup the needsgcroot computation in a few places
  • finish implementation and usage of mark_gc_use

@yuyichao yuyichao added the compiler:codegen Generation of LLVM IR and native code label Jan 3, 2016
@Keno
Copy link
Member

Keno commented Jan 3, 2016

👍 I think this is a great direction for our codegen in general.

@vtjnash vtjnash force-pushed the jn/codegen_rewrite2 branch 2 times, most recently from 4f2e812 to 55fd11a Compare January 6, 2016 22:20
@vtjnash vtjnash force-pushed the jn/codegen_rewrite2 branch 2 times, most recently from af35da1 to a5f25fc Compare January 20, 2016 07:13
@vtjnash vtjnash force-pushed the jn/codegen_rewrite2 branch 2 times, most recently from f6eb4d3 to 36a58c7 Compare February 11, 2016 23:26
… wrong method early in bootstrapping

later in bootstrapping, the new image is sufficiently similar, the error is no longer important
but early in bootstrapping, it may try to call the wrong method and not have enough method defined to succeed
the emitted roots are essentially placeholders for later simplification by a root optimization pass.
the first step here is to help the codegen system distinguish between roots used for function arguments and roots used for locals
…runtime computation matches the result of is_stable_expr
…so that it is easier to elid the store when finalizing the gc frame
…inalize_gc_frame

this computes frame variable liveness at a basicblock level, then uses
that to build an optimized jlcall frame
detects simple `store -> load -> store other` or `store, store other` patterns and removes the redundant root
and detects dead roots, and removes them
this moves it a bit closer to becoming an llvm pass
and makes it easier to pass around state to the helper functions
this pass looks for gcroots that are trivially unnecessary and elides them

rewriting this functionality this way made the allocate_frame liveness computation
much more straight-forward while making both passes more powerful
also drop the jltype argument from boxed; the uses of this argument were invalid anyways
@vtjnash vtjnash force-pushed the jn/codegen_rewrite2 branch from 36a58c7 to 00dd5d5 Compare February 12, 2016 03:28
@vtjnash vtjnash changed the title WIP: codegen gcroot optimization codegen gcroot optimization Feb 12, 2016
@vtjnash
Copy link
Member Author

vtjnash commented Feb 12, 2016

i removed the WIP label, since, while I know there are more optimizations possible, i believe this to be fairly competitive with master now

jl_cgval_t arg2 = emit_expr(args[3],ctx);
ifelse_result = builder.CreateSelect(isfalse,
boxed(arg2, ctx),
boxed(arg1, ctx));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this just be on one line?

vtjnash added a commit that referenced this pull request Feb 12, 2016
@vtjnash vtjnash merged commit 0b2027e into master Feb 12, 2016
@vtjnash vtjnash deleted the jn/codegen_rewrite2 branch February 12, 2016 20:14
@tkelman
Copy link
Contributor

tkelman commented Feb 12, 2016

I believe this has started causing a bootstrap segfault on win64: https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.13330

@tkelman
Copy link
Contributor

tkelman commented Feb 13, 2016

Confirmed by bisect to have been introduced here. skip was a build failure, good means for i in seq 20; do make clean && make -j8 || break; done succeeded all 20 times, bad means at least one in 20-in-a-row rebuilds and bootstraps caused a segfault or an assertion failure.

$ git bisect bad
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
0b2758f4754f910992c7a6af0b5d865f6ae29720
623ed94fe4c8530744c10e443cdaa4c752768f89
cdd6a41fa5bdec029d2c7555e51124eddc0daa3d
4a513090f627673ecbb655eda7cfb2f6fb2699b4
a0c4b2554b76d191ab0779fe2ef3085e2b9f90cd
15bd22f8c43f157d9d8229267f5c0f1d4b241dcd
2c202be7a313046211ea139a67c7dfa02f7c2250
269008015364c3d31d9c5d7d533951e9314e74d4
b2869f9794804aafe2298f8e9bd3c099f6a42293
0c5218c30a5b7b54f2daeaabffa284ee9f9c65fb
8b1d5a73a03bc3dfe1dfa34d02aaf0016031fc76
97ed64a7722e6b1de9e7b758f56b6380afe5022e
fcd595ab9d441299d5de901fb2f2c44378fa9ece
33ae5da49ba9c34f2c55b57d1b3de3485af8e194
35954a41277e284da540dc340b52b09dd8d8532b
44459f000d2e2a360be95dd4ae9ad11f5b3a2f26
We cannot bisect more!

Tony@LAPTOP-O230JCFF ~/julia
$ git bisect log
git bisect start
# bad: [e304c1b4454d3a00d1c7f2b8fcc9cbb3d2051b0b] Merge pull request #15037 from tkelman/tk/xcompile
git bisect bad e304c1b4454d3a00d1c7f2b8fcc9cbb3d2051b0b
# good: [f080ae35ce472d38b9accbacbad206b3cd330755] Merge pull request #15011 from JuliaLang/yyc/async-unwind
git bisect good f080ae35ce472d38b9accbacbad206b3cd330755
# bad: [00dd5d599ca7c379378b234f05c1b9be35e3d507] add remaining mark_gc_use needed for ccall, llvmcall, emit_unbox, and a few direct consumers
git bisect bad 00dd5d599ca7c379378b234f05c1b9be35e3d507
# skip: [269008015364c3d31d9c5d7d533951e9314e74d4] convert gcroot / type inspection of variable assignments to on-the-fly version
git bisect skip 269008015364c3d31d9c5d7d533951e9314e74d4
# skip: [33ae5da49ba9c34f2c55b57d1b3de3485af8e194] generate really poor, but really simple gc frames
git bisect skip 33ae5da49ba9c34f2c55b57d1b3de3485af8e194
# bad: [d2a8147e2976bbb98c841ef5f1049e975597c79e] extract collapseRedundantRoots into a separate function
git bisect bad d2a8147e2976bbb98c841ef5f1049e975597c79e
# skip: [0c5218c30a5b7b54f2daeaabffa284ee9f9c65fb] increase usage of jl_cgval_t info, rather than expression inspection
git bisect skip 0c5218c30a5b7b54f2daeaabffa284ee9f9c65fb
# good: [e824393a3e0994861f2ce09f257df146811c73d9] Type -> convert overwrite warning may result in Inference calling the wrong method early in bootstrapping
git bisect good e824393a3e0994861f2ce09f257df146811c73d9
# skip: [0b2758f4754f910992c7a6af0b5d865f6ae29720] add a simple stats counter for gcframe size (disabled)
git bisect skip 0b2758f4754f910992c7a6af0b5d865f6ae29720
# skip: [b2869f9794804aafe2298f8e9bd3c099f6a42293] optimize several common gcroot patterns
git bisect skip b2869f9794804aafe2298f8e9bd3c099f6a42293
# skip: [623ed94fe4c8530744c10e443cdaa4c752768f89] add codectx as an argument to mark_julia_type
git bisect skip 623ed94fe4c8530744c10e443cdaa4c752768f89
# skip: [8b1d5a73a03bc3dfe1dfa34d02aaf0016031fc76] minor additional optimizations for jlcall frame creation
git bisect skip 8b1d5a73a03bc3dfe1dfa34d02aaf0016031fc76
# skip: [4a513090f627673ecbb655eda7cfb2f6fb2699b4] start work of marking lifetime regions of all gc-rooted objects in a function
git bisect skip 4a513090f627673ecbb655eda7cfb2f6fb2699b4
# skip: [2c202be7a313046211ea139a67c7dfa02f7c2250] move gc frame allocation for variables to finalize_gc_frame
git bisect skip 2c202be7a313046211ea139a67c7dfa02f7c2250
# skip: [cdd6a41fa5bdec029d2c7555e51124eddc0daa3d] move entire gc frame allocation and optimization for jlcall args to finalize_gc_frame
git bisect skip cdd6a41fa5bdec029d2c7555e51124eddc0daa3d
# skip: [35954a41277e284da540dc340b52b09dd8d8532b] refactor the gcroot algorithm into a class
git bisect skip 35954a41277e284da540dc340b52b09dd8d8532b
# skip: [15bd22f8c43f157d9d8229267f5c0f1d4b241dcd] fix windows compile handling of gcroot_func declaration
git bisect skip 15bd22f8c43f157d9d8229267f5c0f1d4b241dcd
# skip: [97ed64a7722e6b1de9e7b758f56b6380afe5022e] delete gcframe if all of the locals and tempslots get removed
git bisect skip 97ed64a7722e6b1de9e7b758f56b6380afe5022e
# skip: [a0c4b2554b76d191ab0779fe2ef3085e2b9f90cd] use the locals frame as the gc root of a SA value in emit_assignment so that it is easier to elid the store when finalizing the gc frame
git bisect skip a0c4b2554b76d191ab0779fe2ef3085e2b9f90cd
# skip: [fcd595ab9d441299d5de901fb2f2c44378fa9ece] correct isimmutable info and add needsgcroot annotations so that the runtime computation matches the result of is_stable_expr
git bisect skip fcd595ab9d441299d5de901fb2f2c44378fa9ece
# bad: [44459f000d2e2a360be95dd4ae9ad11f5b3a2f26] move the gcroot algorithm as late as possible -- almost inside the FPM
git bisect bad 44459f000d2e2a360be95dd4ae9ad11f5b3a2f26
# only skipped commits left to test
# possible first bad commit: [44459f000d2e2a360be95dd4ae9ad11f5b3a2f26] move the gcroot algorithm as late as possible -- almost inside the FPM
# possible first bad commit: [35954a41277e284da540dc340b52b09dd8d8532b] refactor the gcroot algorithm into a class
# possible first bad commit: [97ed64a7722e6b1de9e7b758f56b6380afe5022e] delete gcframe if all of the locals and tempslots get removed
# possible first bad commit: [8b1d5a73a03bc3dfe1dfa34d02aaf0016031fc76] minor additional optimizations for jlcall frame creation
# possible first bad commit: [b2869f9794804aafe2298f8e9bd3c099f6a42293] optimize several common gcroot patterns
# possible first bad commit: [15bd22f8c43f157d9d8229267f5c0f1d4b241dcd] fix windows compile handling of gcroot_func declaration
# possible first bad commit: [4a513090f627673ecbb655eda7cfb2f6fb2699b4] start work of marking lifetime regions of all gc-rooted objects in a function
# possible first bad commit: [623ed94fe4c8530744c10e443cdaa4c752768f89] add codectx as an argument to mark_julia_type
# possible first bad commit: [0b2758f4754f910992c7a6af0b5d865f6ae29720] add a simple stats counter for gcframe size (disabled)
# possible first bad commit: [cdd6a41fa5bdec029d2c7555e51124eddc0daa3d] move entire gc frame allocation and optimization for jlcall args to finalize_gc_frame
# possible first bad commit: [a0c4b2554b76d191ab0779fe2ef3085e2b9f90cd] use the locals frame as the gc root of a SA value in emit_assignment so that it is easier to elid the store when finalizing the gc frame
# possible first bad commit: [2c202be7a313046211ea139a67c7dfa02f7c2250] move gc frame allocation for variables to finalize_gc_frame
# possible first bad commit: [269008015364c3d31d9c5d7d533951e9314e74d4] convert gcroot / type inspection of variable assignments to on-the-fly version
# possible first bad commit: [0c5218c30a5b7b54f2daeaabffa284ee9f9c65fb] increase usage of jl_cgval_t info, rather than expression inspection
# possible first bad commit: [fcd595ab9d441299d5de901fb2f2c44378fa9ece] correct isimmutable info and add needsgcroot annotations so that the runtime computation matches the result of is_stable_expr
# possible first bad commit: [33ae5da49ba9c34f2c55b57d1b3de3485af8e194] generate really poor, but really simple gc frames

@yuyichao
Copy link
Contributor

I've also got intermittent TypeError during boot strap.

error during bootstrap:
TypeError(func=:abstract_call_gf, context="typeassert", expected=Core.Inference.CallStack, got=Core.Inference.EmptyCallStack())
rec_backtrace at /home/alarm/projects/julia/aarch64-test/src/task.c:667
[inline] at /home/alarm/projects/julia/aarch64-test/src/task.c:712
record_backtrace at /home/alarm/projects/julia/aarch64-test/src/task.c:853
unknown function (ip: 0x7fb7d39cb4)
unknown function (ip: 0x7db25a9e18)
jl_apply_generic at /home/alarm/projects/julia/aarch64-test/src/gf.c:1919
unknown function (ip: 0x7db25ef864)

So far seems more reproducible on aarch64...............

@yuyichao
Copy link
Contributor

Actually, on aarch64, the issue is a stack overflow in the serializer during a super deep type inference. The StackOverflowError() thrown then seems to confuse type inference and causing it to fail on type assertion.

@tkelman is stack overflow detection working on win64? In any case, it might be good to to check if there was any stackoverflow detected.

@yuyichao
Copy link
Contributor

Increase the stack size during compilation works around the issue (I increased it from 8M to 80M just to be safe but I think we probably don't actually need that much) so this is something that worth checking/use as temporary workaround.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 13, 2016

i guess openblas has already been doing this for us on x64, causing me to miss this entirely in testing :(. Moreover, the automated gc-rooting (4a51309) significantly increased the size of the code (proxy for measuring the average stackframe, with all else unchanged). The follow on commits slowly brought that back down via optimizations, but I suspect some common optimizable patterns may be still lacking. Also, fwiw, the win64 stackframe is often much larger than the platform ABI standard one, and type inference already often runs fairly close to the value we've been using so it wouldn't be unreasonable to double it again.

@yuyichao
Copy link
Contributor

OpenBlas only does that for the git version and I think the issue isn't that bad on x64 (I patched it out in openblas-git and still don't really see the issue (only once with threading)).

some common optimizable patterns may be still lacking

I think trying to optimize it on master is totally fine and I guess the way you propose to flatten the type inference should also help this a lot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants