Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not store JSON Yul ASTs and Yul CFG in CompilerStack #15451

Merged
merged 2 commits into from
Sep 27, 2024

Conversation

cameel
Copy link
Member

@cameel cameel commented Sep 24, 2024

Depends on #15457. Merged.

This fixes excessive memory usage due to JSON representation of Yul ASTs and Yul CFG being always calculated from IR, even if not requested as an output.

The PR fixes the problem by changing CompilerStack so that these artifacts are only prepared when requested by CLI or StandardCompiler. This requires reparsing the IR from scratch each time (and the PR does not go out of its way to ensure it's parsed only once if multiple ASTs are requested), but Yul parsing is generally fast and these artifacts are not needed in the typical Solidity development workflow anyway.

Effects

This has a huge impact on memory usage and also non-negligible impact on running time.

For example when compiling Uniswap using our external benchmarks, disabling generation of this data makes it go from 143 s and 5 GB RAM down to 115 s and 1.5 GB RAM.

I've been noticing the aggressive RAM usage for a while now, to the point that I had to be careful when running the external benchmarks - compiling Eigenlayer or Sablier sometimes required more than 30 GB RAM and would crash on my machine.

@cameel cameel requested review from r0qs and clonker September 24, 2024 15:15
@cameel cameel self-assigned this Sep 24, 2024
@cameel
Copy link
Member Author

cameel commented Sep 24, 2024

Hmm... tests seem to be failing due to different native locations in the optimized AST:

-            "nativeSrc": "58:298:0",
+            "nativeSrc": "58:315:0",
             "nodeType": "YulBlock",
-                    "nativeSrc": "68:282:0",
+                    "nativeSrc": "68:299:0",
                     "nodeType": "YulBlock",
-                            "nativeSrc": "111:27:0",
+                            "nativeSrc": "128:27:0",
                             "nodeType": "YulVariableDeclaration",

@cameel cameel changed the title Compiler stack do not store serialized json Do not store JSON Yul ASTs and Yul CFG in CompilerStack Sep 24, 2024
@cameel
Copy link
Member Author

cameel commented Sep 24, 2024

Benchmarks

Using the improved benchmarking scripts from #15450.

IR pipeline

File Time (0.8.27) Time (this PR) Improvement Exit code
openzeppelin 40 s 32 s -20% 0
uniswap-v4 157 s 116 s -26% 0
eigenlayer 716 s 531 s -26% 0
seaport 13 s 12 s -8% 1
sablier-v2 935 s 702 s -25% 1
File Memory (0.8.27) Memory (this PR) Improvement Exit code
openzeppelin 1220 MiB 495 MiB -59% 0
uniswap-v4 4805 MiB 1494 MiB -69% 0
eigenlayer 20346 MiB 4470 MiB -78% 0
seaport 1042 MiB 950 MiB -9% 1
sablier-v2 25535 MiB 6268 MiB -75% 1

Note: For the last two the numbers do not represent the whole project, because in each there is a contract that fails to compile due to a StackTooDeep error.

Legacy pipeline

There should be zero difference here since these artifacts are only generated when IR is requested.
They're still potentially useful to see what the variance of these numbers is (they come from a single run).

File Time (0.8.27) Time (this PR) Improvement Exit code
openzeppelin 10 s 10 s 0% 0
uniswap-v4 24 s 20 s -17% 0
eigenlayer 86 s 72 s -16% 0
seaport 183 s 159 s -13% 0
sablier-v2 155 s 149 s -4% 0
File Memory (0.8.27) Memory (this PR) Improvement Exit code
openzeppelin 519 MiB 517 MiB 0% 0
uniswap-v4 1013 MiB 1011 MiB 0% 0
eigenlayer 2833 MiB 2835 MiB 0% 0
seaport 4476 MiB 4476 MiB 0% 0
sablier-v2 4882 MiB 5213 MiB +7% 0

@ekpyron ekpyron added this to the 0.8.28 milestone Sep 24, 2024
@cameel
Copy link
Member Author

cameel commented Sep 24, 2024

An interesting thing is that this overhead of JSON processing did not show up in profiling I did recently: #15179 (comment). I'd expect to see YulStack::astJson() there, bit I can't see any JSON-related functions at all on any of the diagrams. I wonder if it might be something along the lines of the overhead coming from system memory management and the profiler does not counting that?

One thing I do see is YulStack::parseAndAnalyze() taking 6% of execution for Uniswap on develop, which might indicate that all that extra reparsing may be starting to add up. Or not - this is the only diagram where I see it so it could also be a fluke.

@clonker
Copy link
Member

clonker commented Sep 25, 2024

Hmm... tests seem to be failing due to different native locations in the optimized AST:

Not sure if it is the problem leading to it, but it may be a problem regardless:
I did a little experiment and compared the output of

contract(_contractName).yulIROptimized

with the output of

loadGeneratedIR(loadGeneratedIR(loadGeneratedIR(contract(_contractName).yulIROptimized).print()).print()).print()

just to see if it stays consistent under repeated loading and printing operations. It doesn't:

@@ -1,10 +1,10 @@
 /// @use-src 0:"test/cmdlineTests/ast_ir/input.sol"
 object "C_2" {
     code {
         {
-            /// @src 0:60:73  "contract C {}"
+            /// @src 0:60:73
             let _1 := memoryguard(0x80)
             mstore(64, _1)
             if callvalue() { revert(0, 0) }
             let _2 := datasize("C_2_deployed")
             codecopy(_1, dataoffset("C_2_deployed"), _2)
@@ -13,11 +13,11 @@
     }
     /// @use-src 0:"test/cmdlineTests/ast_ir/input.sol"
     object "C_2_deployed" {
         code {
             {
-                /// @src 0:60:73  "contract C {}"
+                /// @src 0:60:73
                 revert(0, 0)
             }
         }
         data ".metadata" hex"a2646970667358221220b0b8ededf61719cebc357d865ee54b27e1d49d69b9c45f8107b7210d698521d864736f6c63782c302e382e32382d646576656c6f702e323032342e392e32352b636f6d6d69742e34663630383164662e6d6f64005d"
     }

@clonker
Copy link
Member

clonker commented Sep 25, 2024

Also the difference is multiples of 17, which is consistent with the length of "contract C {}" - so I recon that it is the problem after all

@cameel cameel added the 🟡 PR review label label Sep 25, 2024
@cameel cameel force-pushed the compiler-stack-do-not-store-serialized-json branch from 4f6081d to 7ff5fbd Compare September 25, 2024 12:44
@cameel
Copy link
Member Author

cameel commented Sep 25, 2024

loadGeneratedIR(loadGeneratedIR(loadGeneratedIR(contract(_contractName).yulIROptimized).print()).print()).print()

There probably is something to it, the exact 17 char difference is too specific, but at least in the snippet above the cause is just the way you print it.

yulIROptimized was generated with print(*this), where this is the CompilerStack. The parameter is an object that can provide access to sources, which are needed if you want to print debug snippets. Plain print() without an argument does not have access to it and won't print those snippets.

I'm trying to debug it but, oddly, after rebasing this I cannot reproduce this difference locally. The test passes for me somehow. I haven't tried it before the rebase, so I'm not sure if it's just not reproducible on my machine in general or if it's the rebase that changed something...

EDIT: It still fails in CI so it's not the rebase.

@cameel cameel force-pushed the compiler-stack-do-not-store-serialized-json branch from 7ff5fbd to c23cac8 Compare September 25, 2024 17:30
@cameel cameel changed the base branch from develop to fix-code-snippets-shifting-yul-ast-native-src-locations September 25, 2024 17:45
@cameel cameel added the has dependencies The PR depends on other PRs that must be merged first label Sep 25, 2024
@cameel
Copy link
Member Author

cameel commented Sep 25, 2024

I think I figured it out. Looks like my previous fix for native locations (#15457) did not properly account for code snippets in debug info and I did not notice that. TBH it's quite hard to notice when these offsets are wrong in test output and verifying them all manually is not feasible - which is the reason why we had the original issue in the first place.

New fix: #15457.

@cameel cameel force-pushed the compiler-stack-do-not-store-serialized-json branch from c23cac8 to 2360c54 Compare September 25, 2024 18:08
@cameel cameel force-pushed the fix-code-snippets-shifting-yul-ast-native-src-locations branch from 995da44 to 614f312 Compare September 25, 2024 18:08
@cameel cameel force-pushed the fix-code-snippets-shifting-yul-ast-native-src-locations branch from 614f312 to 4216aa7 Compare September 26, 2024 15:16
@cameel cameel force-pushed the compiler-stack-do-not-store-serialized-json branch from 2360c54 to c78a8d2 Compare September 26, 2024 15:28
Base automatically changed from fix-code-snippets-shifting-yul-ast-native-src-locations to develop September 26, 2024 15:51
@cameel cameel marked this pull request as ready for review September 26, 2024 16:46
- The JSON artifacts take a lot of memory so always generating them is very wasteful.
- Even when requested, they are only relevant while the current contract is being copmiled. We only really want to store the serialized JSON that will be returned to the user.
- This actually also speeds up compilation quite a bit.
@cameel cameel force-pushed the compiler-stack-do-not-store-serialized-json branch from c78a8d2 to 9ccc44a Compare September 26, 2024 16:46
@cameel cameel removed the has dependencies The PR depends on other PRs that must be merged first label Sep 26, 2024
@cameel cameel merged commit 26d5b3f into develop Sep 27, 2024
72 checks passed
@cameel cameel deleted the compiler-stack-do-not-store-serialized-json branch September 27, 2024 13:17
@cameel
Copy link
Member Author

cameel commented Sep 30, 2024

I did some more benchmarking and profiling on top of this.

First I ran our profiling (PROFILE_OPTIMIZER_STEPS CMake option) to see how long each step takes. See #15465 (comment) for detailed results.

I also checked how long each part of the pipeline takes now:

OpenZeppelin Uniswap Eigenlayer
Compilation (optimized) 32 s 117 s 514 s
Optimizer steps 18 s 61 s 261 s
Compilation (unoptimized) 15 s 49 s 212 s
Compilation (only IR generation) 4 s 10 s 37 s
Compilation (only analysis) 1 s 2 s 2 s

The numbers are from running full compilation with only specific outputs selected, except for "Optimizer steps", which is the time I got from profiling optimization steps in #15465 (comment).

The same in relative terms:

OpenZeppelin Uniswap Eigenlayer
Compilation (optimized) 100% 100% 100%
Optimizer steps 56% 52% 51%
Compilation (unoptimized) 47% 42% 41%
Compilation (only IR generation) 13% 9% 7%
Compilation (only analysis) 3% 2% 0%

Conclusions

  • Optimization time is now down to 50-60% of the total.
  • IR generation and analysis are only ~10% of the total. The rest is optimization and bytecode generation.
  • CSE is the single slowest step, taking 20% of the optimization time.
  • Results are pretty consistent between projects. Top 10 steps are (almost) the same and relative running times are very similar too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants