reduce copies of generic functions to improve compile times #319

ScanMountGoat · 2025-03-04T17:22:06Z

I split out the fast int optimization into a generic function to reduce generated code as tested with cargo llvm-lines. This cuts compile times in release mode for building just the CLI on one of my projects in half. I was also able to reduce the copies of binrw::__private::restore_position, but that seems to make a measurable but not really noticeable difference.

0.14.1

  Lines                  Copies               Function name
  -----                  ------               -------------
  2363737                27552                (TOTAL)
   701044 (29.7%, 29.7%)   416 (1.5%,  1.5%)  binrw::helpers::count_with::{{closure}}

fork

  Lines                  Copies               Function name
  -----                  ------               -------------
  1915524                26572                (TOTAL)
   332052 (17.3%, 17.3%)   416 (1.6%,  1.6%)  binrw::helpers::count_with::{{closure}}

CLI code for reference:
https://github.com/ScanMountGoat/xc3_lib/blob/88a2a635336d86062ce4db24022fa0cf956885d3/xc3_test

csnover · 2025-03-04T19:53:30Z

Nice! Are you able to quickly try the branch at #317 and see if it offers similar compile time improvement? I would not want to regress your use case by landing this and then inadvertently reintroduce the same problem with #317.

kitlith

Overall, looks good to me.

I'll take a look at how their project compares, but in my testing done in the discord chat with cargo llvm-lines --test binread_impls:

Make count_with more consistent for builtin types. #318 (status quo in this regard) had 24233 lines total
Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317 had 10122 lines total

which looked to be a similar ratio of improvement as to what they're doing here.

If there is a regression in #317, that can likely be fixed by delegating to this same generic function instead of duplicating the implementation for the specializations.

binrw_derive/src/binrw/codegen/read_options.rs

binrw/src/binread/impls.rs

ScanMountGoat · 2025-03-04T20:15:35Z

I did a quick test with the different versions. The build times are fairly consistent for each version. This is a slightly different commit of my CLI tool, so the exact times or line counts aren't directly comparable to my initial post. The baseline is just a simple for loop.

0.14.1: 1m 04s
#317: 33.5s
#319: 42.35s
for loop: 33.6s

I'm happy to make any changes that need to be made depending on the merge order of the PRs.

kitlith · 2025-03-04T21:32:17Z

On my machine, running cargo llvm-lines on xc3-test at commit 2b82863 produces:

master: 1320556 lines total
Make count_with more consistent for builtin types. #318 (move opts to count instead of count_with): 1315076 lines total
reduce copies of generic functions to improve compile times #319 (this pr): 1066180 lines total
the current state of Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317: 989413 lines total

What commit are you testing with?

I suspect the reason why #317 is so similar in compile time to the for loop is because it ends up delegating directly to the unoptimized implementation in most cases, as part of the default impl, instead of having to check that it isn't one of 10 or so types first. If that's the reason, though, that surprises me, as that would imply that this trick of downcasting is slow at compile time. It might be the result of some other implementation decision, though.

ScanMountGoat · 2025-03-04T22:05:26Z

Compile times for 2b82863 on my MacBook Air M1. I build once to compile all the dependencies, add an empty line, and build just the xc3_test CLI again all in release mode.

master: 29.1s
#317: 12.10s
#318: 27.63
#319: 15.96s

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

Closes jam1garner#319.

Refs jam1garner#319.

csnover · 2025-03-10T22:13:43Z

I added a basic benchmark to verify no runtime regressions and found that the swap bytes performance regressed, so I modified the SwapBytes trait implementations to force inlining, which fixed that. Otherwise, I landed this as-is with the fixup commits merged. Thanks!

ScanMountGoat added 3 commits February 25, 2025 08:15

reduce generic copies of restore_position

7b279bd

don't duplicate fast int code in each generic copy

41d74e4

fix clippy warnings

42c6030

kitlith approved these changes Mar 4, 2025

View reviewed changes

binrw_derive/src/binrw/codegen/read_options.rs Outdated Show resolved Hide resolved

binrw/src/binread/impls.rs Outdated Show resolved Hide resolved

use map_err for restore_position

0dc9302

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

5f1c803

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

377e61e

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

7889182

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

999da45

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

5c8c911

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

40edbfd

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

0a9c38d

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

d19f89d

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

6be4e20

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

f8fe844

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

c586dd9

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

e6436ac

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

dfb7e01

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

0b54a81

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

e2d50b9

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

401b5da

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

e9609b5

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

bf73b4d

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

7b43dba

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

097800f

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

28b236d

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

40db5e5

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

56f40e4

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Don't duplicate fast int code in each generic copy

e0c4b5a

Closes jam1garner#319.

csnover added a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Add basic runtime benchmarks for fast-vec optimisations

4099d3c

Refs jam1garner#319.

csnover pushed a commit to csnover/binrw that referenced this pull request Mar 10, 2025

Reduce generic copies of restore_position

4a061e7

Refs jam1garner#319.

csnover closed this in 53e4800 Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce copies of generic functions to improve compile times #319

reduce copies of generic functions to improve compile times #319

ScanMountGoat commented Mar 4, 2025

csnover commented Mar 4, 2025

kitlith left a comment

ScanMountGoat commented Mar 4, 2025 •

edited

Loading

kitlith commented Mar 4, 2025 •

edited

Loading

ScanMountGoat commented Mar 4, 2025

csnover commented Mar 10, 2025

reduce copies of generic functions to improve compile times #319

reduce copies of generic functions to improve compile times #319

Conversation

ScanMountGoat commented Mar 4, 2025

csnover commented Mar 4, 2025

kitlith left a comment

Choose a reason for hiding this comment

ScanMountGoat commented Mar 4, 2025 • edited Loading

kitlith commented Mar 4, 2025 • edited Loading

ScanMountGoat commented Mar 4, 2025

csnover commented Mar 10, 2025

ScanMountGoat commented Mar 4, 2025 •

edited

Loading

kitlith commented Mar 4, 2025 •

edited

Loading