-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce copies of generic functions to improve compile times #319
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me.
I'll take a look at how their project compares, but in my testing done in the discord chat with cargo llvm-lines --test binread_impls
:
- Make count_with more consistent for builtin types. #318 (status quo in this regard) had 24233 lines total
- Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317 had 10122 lines total
which looked to be a similar ratio of improvement as to what they're doing here.
If there is a regression in #317, that can likely be fixed by delegating to this same generic function instead of duplicating the implementation for the specializations.
I did a quick test with the different versions. The build times are fairly consistent for each version. This is a slightly different commit of my CLI tool, so the exact times or line counts aren't directly comparable to my initial post. The baseline is just a simple for loop. 0.14.1: 1m 04s I'm happy to make any changes that need to be made depending on the merge order of the PRs. |
On my machine, running
What commit are you testing with? I suspect the reason why #317 is so similar in compile time to the for loop is because it ends up delegating directly to the unoptimized implementation in most cases, as part of the default impl, instead of having to check that it isn't one of 10 or so types first. If that's the reason, though, that surprises me, as that would imply that this trick of downcasting is slow at compile time. It might be the result of some other implementation decision, though. |
I added a basic benchmark to verify no runtime regressions and found that the swap bytes performance regressed, so I modified the |
I split out the fast int optimization into a generic function to reduce generated code as tested with cargo llvm-lines. This cuts compile times in release mode for building just the CLI on one of my projects in half. I was also able to reduce the copies of
binrw::__private::restore_position
, but that seems to make a measurable but not really noticeable difference.0.14.1
fork
CLI code for reference:
https://github.com/ScanMountGoat/xc3_lib/blob/88a2a635336d86062ce4db24022fa0cf956885d3/xc3_test