Implement a fast path for `#attributes_to_hash` #424

byroot · 2025-03-08T12:44:09Z

This uses a number of performance tricks.

Rather than to call transform_key for every attribute of every instances, we do it once statically, and then use transform_values to fetch the values. This remove a big overhead, but is made a bit tricky as there are a number of global configurations that can invalidate that cache. I made it work, but things like using ObjectSpace.each_object isn't reasonable.

Hash#transform_values is also good because it allow Ruby to directly allocate a Hash of the right size, and skip the need to re-hash keys.
This doesn't make a big difference on the existing benchmark because there aren't many attributes, but I expect the gains would be more important the more attributes you have.

Makes @_resource_methods a Hash, so we can check whether we need to call the method on ourself or on obj in O(1).

Inline the Symbol case for fetching attributes as it's the most common.
More cases could be inlined.

Specialize a fast path when @_on_error isn't set, because rescue clause cause an overhead (need to call setjmp(3).
So it's best not to rescue when we don't have to.

Before:

ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                alba    82.000 i/100ms
Calculating -------------------------------------
                alba    830.409 (± 0.8%) i/s    (1.20 ms/i) -      4.182k in   5.036433s
Calculating -------------------------------------
                alba   818.241k memsize (     0.000  retained)
                         9.807k objects (     0.000  retained)
                         6.000  strings (     0.000  retained)

After:

ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
                alba   107.000 i/100ms
Calculating -------------------------------------
                alba      1.074k (± 0.7%) i/s  (930.85 μs/i) -      5.457k in   5.079952s
Calculating -------------------------------------
                alba   818.241k memsize (     0.000  retained)
                         9.807k objects (     0.000  retained)
                         6.000  strings (     0.000  retained)

Draft

Opening this as a draft because as noted above and in code comments, there are a number of Alba feature / flexibility that conflict a bit with this approach. So there would be some extra work needed to not break compatibility in some corner cases.

I mostly wanted to see how far we can theoretically go.

Next Step?

After that, looking at the profile, the only significant gain left I could think of would be to generate customized code for attributes_to_hash so that instead of relying on __send__, we could have regular method calls, hence help YJIT and the VM cache calls.

I couldn't find a reason for this pattern to call a method to get a similar proc every time. It does incur an extra method call, and a Proc object allocation. In addition, in the case of `collection_converter`, the relatively complex code can be replaced by a simpler and faster `map`. Map is preferable over shifting objects into a new Array, because it allows Ruby to directly allocate an Array of the right size rather than to have to potentially resize it multiple times. This isn't a big gain, but I think it makes the code easier to read anyway. Before: ``` Calculating ------------------------------------- alba 801.321 (± 0.6%) i/s (1.25 ms/i) - 4.080k in 5.091760s Calculating ------------------------------------- alba 826.321k memsize ( 0.000 retained) 9.908k objects ( 0.000 retained) 6.000 strings ( 0.000 retained) ``` After: ``` Calculating ------------------------------------- alba 830.039 (± 0.8%) i/s (1.20 ms/i) - 4.182k in 5.038669s Calculating ------------------------------------- alba 818.241k memsize ( 0.000 retained) 9.807k objects ( 0.000 retained) 6.000 strings ( 0.000 retained) ```

This uses a number of performance tricks. Rather than to call `transform_key` for every attribute of every instances, we do it once statically, and then use `transform_values` to fetch the values. This remove a big overhead, but is made a bit tricky as there are a number of global configurations that can invalidate that cache. I made it work, but things like using `ObjectSpace.each_object` isn't reasonable. `Hash#transform_values` is also good because it allow Ruby to directly allocate a Hash of the right size, and skip the need to re-hash keys. This doesn't make a big difference on the existing benchmark because there aren't many attributes, but I expect the gains would be more important the more attributes you have. Makes `@_resource_methods` a Hash, so we can check whether we need to call the method on ourself or on `obj` in `O(1)`. Inline the `Symbol` case for fetching attributes as it's the most common. More cases could be inlined. Specialize a fast path when `@_on_error` isn't set, because `rescue` clause cause an overhead (need to call `setjmp(3)`. So it's best not to rescue when we don't have to. Before: ``` ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- alba 82.000 i/100ms Calculating ------------------------------------- alba 830.409 (± 0.8%) i/s (1.20 ms/i) - 4.182k in 5.036433s Calculating ------------------------------------- alba 818.241k memsize ( 0.000 retained) 9.807k objects ( 0.000 retained) 6.000 strings ( 0.000 retained) ``` After: ``` ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- alba 107.000 i/100ms Calculating ------------------------------------- alba 1.074k (± 0.7%) i/s (930.85 μs/i) - 5.457k in 5.079952s Calculating ------------------------------------- alba 818.241k memsize ( 0.000 retained) 9.807k objects ( 0.000 retained) 6.000 strings ( 0.000 retained) ```

codecov · 2025-03-08T12:47:39Z

Codecov Report

Attention: Patch coverage is 30.98592% with 49 lines in your changes missing coverage. Please review.

Project coverage is 46.22%. Comparing base (ad700cd) to head (ceee5da).

Files with missing lines	Patch %	Lines
lib/alba/resource.rb	30.00%	49 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ad700cd) and HEAD (ceee5da). Click for more details.

HEAD has 21 uploads less than BASE

Flag BASE (ad700cd) HEAD (ceee5da)

63 42

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #424       +/-   ##
===========================================
- Coverage   99.66%   46.22%   -53.45%     
===========================================
  Files          14       13        -1     
  Lines         603      649       +46     
  Branches      156      174       +18     
===========================================
- Hits          601      300      -301     
- Misses          2      349      +347

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

byroot added 2 commits March 7, 2025 14:27

byroot mentioned this pull request Mar 8, 2025

Inline lambdas #423

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a fast path for `#attributes_to_hash` #424

Implement a fast path for `#attributes_to_hash` #424

byroot commented Mar 8, 2025

codecov bot commented Mar 8, 2025 •

edited

Loading

Implement a fast path for #attributes_to_hash #424

Are you sure you want to change the base?

Implement a fast path for #attributes_to_hash #424

Conversation

byroot commented Mar 8, 2025

Draft

Next Step?

codecov bot commented Mar 8, 2025 • edited Loading

Codecov Report

Implement a fast path for `#attributes_to_hash` #424

Implement a fast path for `#attributes_to_hash` #424

codecov bot commented Mar 8, 2025 •

edited

Loading