Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce copies of generic functions to improve compile times #319

Closed
wants to merge 4 commits into from

Conversation

ScanMountGoat
Copy link
Contributor

I split out the fast int optimization into a generic function to reduce generated code as tested with cargo llvm-lines. This cuts compile times in release mode for building just the CLI on one of my projects in half. I was also able to reduce the copies of binrw::__private::restore_position, but that seems to make a measurable but not really noticeable difference.

0.14.1

  Lines                  Copies               Function name
  -----                  ------               -------------
  2363737                27552                (TOTAL)
   701044 (29.7%, 29.7%)   416 (1.5%,  1.5%)  binrw::helpers::count_with::{{closure}}

fork

  Lines                  Copies               Function name
  -----                  ------               -------------
  1915524                26572                (TOTAL)
   332052 (17.3%, 17.3%)   416 (1.6%,  1.6%)  binrw::helpers::count_with::{{closure}}

CLI code for reference:
https://github.com/ScanMountGoat/xc3_lib/blob/88a2a635336d86062ce4db24022fa0cf956885d3/xc3_test

@csnover
Copy link
Collaborator

csnover commented Mar 4, 2025

Nice! Are you able to quickly try the branch at #317 and see if it offers similar compile time improvement? I would not want to regress your use case by landing this and then inadvertently reintroduce the same problem with #317.

Copy link
Collaborator

@kitlith kitlith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.

I'll take a look at how their project compares, but in my testing done in the discord chat with cargo llvm-lines --test binread_impls:

which looked to be a similar ratio of improvement as to what they're doing here.

If there is a regression in #317, that can likely be fixed by delegating to this same generic function instead of duplicating the implementation for the specializations.

@ScanMountGoat
Copy link
Contributor Author

ScanMountGoat commented Mar 4, 2025

I did a quick test with the different versions. The build times are fairly consistent for each version. This is a slightly different commit of my CLI tool, so the exact times or line counts aren't directly comparable to my initial post. The baseline is just a simple for loop.

0.14.1: 1m 04s
#317: 33.5s
#319: 42.35s
for loop: 33.6s

I'm happy to make any changes that need to be made depending on the merge order of the PRs.

@kitlith
Copy link
Collaborator

kitlith commented Mar 4, 2025

On my machine, running cargo llvm-lines on xc3-test at commit 2b82863 produces:

What commit are you testing with?

I suspect the reason why #317 is so similar in compile time to the for loop is because it ends up delegating directly to the unoptimized implementation in most cases, as part of the default impl, instead of having to check that it isn't one of 10 or so types first. If that's the reason, though, that surprises me, as that would imply that this trick of downcasting is slow at compile time. It might be the result of some other implementation decision, though.

@ScanMountGoat
Copy link
Contributor Author

Compile times for 2b82863 on my MacBook Air M1. I build once to compile all the dependencies, add an empty line, and build just the xc3_test CLI again all in release mode.

master: 29.1s
#317: 12.10s
#318: 27.63
#319: 15.96s

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025
csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025
@csnover csnover closed this in 53e4800 Mar 10, 2025
@csnover
Copy link
Collaborator

csnover commented Mar 10, 2025

I added a basic benchmark to verify no runtime regressions and found that the swap bytes performance regressed, so I modified the SwapBytes trait implementations to force inlining, which fixed that. Otherwise, I landed this as-is with the fixup commits merged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants