Redesign the frame layout to avoid the redundant computation of the stackOffsets #99278

shushanhf · 2024-03-05T07:30:04Z

Redesign the frame layout to avoid the redundant computation of the stackOffsets.
After the lvaAssignVirtualFrameOffsetsToLocals, there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.

First I push the LoongArch64's optimization, then I will implement the ARM64's optimization. (Of course including the RISCV64)
After this PR's optimization, the frame layout liking this:

    //      |                       |
    //      |-----------------------|
    //      |  incoming arguments   |
    //      +=======================+ <---- Caller's SP
    //      |  Varargs regs space   | // Only for varargs functions; 64 bytes
    //      |-----------------------|
    //      |    MonitorAcquired    | // 8 bytes; for synchronized methods
    //      |-----------------------|
    //      |        PSP slot       | // 8 bytes (omitted in NativeAOT ABI)
    //      |-----------------------|
    //      |Callee saved registers | // not including FP/LR; multiple of 8 bytes
    //      |-----------------------|
    //      |      Saved LR         | // 8 bytes, For LoongArch64/RISCV64, here is RA.
    //      |-----------------------|
    //      |      Saved FP         | // 8 bytes,  for ARM64, here is 16-aligned.
    //      |-----------------------|
    //      |  possible GS cookie   |
    //      |-----------------------|
    //      | locals, temps, etc.   |
    //      |-----------------------|
    //      |  possible GS cookie   |
    //      |-----------------------|
    //      |   Outgoing arg space  | // multiple of 8 bytes; if required (i.e., #outsz != 0)
    //      |-----------------------| <---- Ambient SP
    //      |       |               |
    //      ~       | Stack grows   ~
    //      |       | downward      |
    //              V

The frame layout will be divided into two parts by the FP slot.
The stackOffsets is positive above the FP slot, while the stackOffsets is negative below the FP slot.
As this PR, the stackOffsets will be finished with the lvaAssignVirtualFrameOffsetsToLocals,
so there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.
Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

Especially for ARM64, the CodeGen::genPushCalleeSavedRegisters will be significantly simplified.
At the same time, there is also no need to use the SetSaveFpLrWithAllCalleeSavedRegisters.

The TARGET_XARCH is not always using the frame pointer.
So I will amend the Compiler::lvaAssignVirtualFrameOffsetsToLocals depending on the frame layout is based on frame pointer or sp, liking this:

// The `Compiler::lvaAssignVirtualFrameOffsetsToLocals`.
if  stackOffset is based on the frame pointer:  // NOTE: the frame pointer is saved on the top where higher address.
{
       // the stackOffset is decreasing one by one.  The LoongArch64 and RISCV64 are always here.
}
else //stackOffset is based on the SP:      //NOTE: include the ARM64's FP is saved the SP+0 which equivalent of based on SP.
{
       // the stackOffset is increasing one by one.
}

The stackOffset will be final value within the Compiler::lvaAssignVirtualFrameOffsetsToLocals.
Then will delete the final computation of stackOffset within the Compiler::lvaFixVirtualFrameOffsets.

Next plans:

supporting the ARM64

As the ARM32 and XARCH don't always use the frame pointer, these should be done by a separate PR.

This PR is the first PR and I will push a series of PRs on future to amend related code for ARMARCH/XARCH/LA64/RISCV64.
But don't worry, I will don't increase the ARM64's Prolog/Epilog's asm size and will not disturb the runtime-team to optimize the leaf functions's FP #35274, I can wait to modify ARM64 unitil you finish liking #35274.
I will uniform most for ARMARCH/XARCH/LA64/RISCV64 to ease the long term burden for different implementations.

ghost · 2024-03-05T07:30:13Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Redesign the frame layout to avoid the redundant computation of the stackOffsets.
After the lvaAssignVirtualFrameOffsetsToLocals, there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.

First I push the LoongArch64's optimization, if needed, I can also implement the ARM64's optimization.

Author:	shushanhf
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jakobbotsch · 2024-03-05T10:50:31Z

What is the optimization/benefit here? It does not seem like a good idea to make LA64 different from the other backends (e.g. I see it introduces more ifdefs in shared code).

shushanhf · 2024-03-05T11:03:53Z

What is the optimization/benefit here?

I want to optimize the computation of the LclVarDsc's StackOffset.
Add this PR, after the lvaAssignVirtualFrameOffsetsToLocals, there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.

It does not seem like a good idea to make LA64 different from the other backends (e.g. I see it introduces more ifdefs in shared code).

This PR is the first PR, if this is OK, I can implement the other CPU liking this PR, at least the RISC CPUs.

jakobbotsch · 2024-03-05T11:30:35Z

I'm not that familiar with the stack layout stuff, but are you saying that lvaFixVirtualFrameOffsets can be removed on x64/arm64 as well? If so that sounds like a good clean up.
@BruceForstall may be more familiar with this code and may have some opinions here.

shushanhf · 2024-03-05T11:46:34Z

I'm not that familiar with the stack layout stuff, but are you saying that lvaFixVirtualFrameOffsets can be removed on x64/arm64 as well?

At least for RISC-ISA CPUs liking the LA64/ARM64/RISCV64.

For LA64, I had tested it OK.

BruceForstall · 2024-03-07T01:28:51Z

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

shushanhf · 2024-03-07T01:33:07Z

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

I will give detailed explanation later.

shushanhf · 2024-03-07T02:41:06Z

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

Hi, @BruceForstall
The computation of the LclVarDsc StackOffset are mainly two steps:

lvaAssignVirtualFrameOffsetsToLocals is just setting an initial value (that is a media value only) and compute the compLclFrameSize;
During the lvaFixVirtualFrameOffsets, the LclVarDsc StackOffset will be determined one by one by the delta value. Here is just adding a value delta through a for-loop. After finished these, the LclVarDsc StackOffset is the last relative offset.

In fact, the two steps can be merged together by optimization liking this PR.
During the lvaAssignVirtualFrameOffsetsToLocals by this PR, the LclVarDsc StackOffset is finished（here the offset is the real relative offset） while there is no need to add a delta within the lvaFixVirtualFrameOffsets by a for-loop.
Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

shushanhf · 2024-03-11T08:33:25Z

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

Hi, @BruceForstall The computation of the LclVarDsc StackOffset are mainly two steps:

lvaAssignVirtualFrameOffsetsToLocals is just setting an initial value (that is a media value only) and compute the compLclFrameSize;

During the lvaFixVirtualFrameOffsets, the LclVarDsc StackOffset will be determined one by one by the delta value. Here is just adding a value delta through a for-loop. After finished these, the LclVarDsc StackOffset is the last relative offset.

In fact, the two steps can be merged together by optimization liking this PR. During the lvaAssignVirtualFrameOffsetsToLocals by this PR, the LclVarDsc StackOffset is finished（here the offset is the real relative offset） while there is no need to add a delta within the lvaFixVirtualFrameOffsets by a for-loop. Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

Hi, @BruceForstall
What should I add some supplementary instruction?
I think this PR also has reference value for other CPUs and if need I can also implment for ARM and RISCV.

shushanhf · 2024-03-13T03:37:02Z

@BruceForstall @jakobbotsch
Now I will implement this PR to ARM64 and RISCV64.

BruceForstall · 2024-03-13T04:01:20Z

@BruceForstall @jakobbotsch Now I will implement this PR to ARM64 and RISCV64.

Can you give us a little more time to think about this before spending more time on this?

shushanhf · 2024-03-13T04:40:55Z

@BruceForstall @jakobbotsch Now I will implement this PR to ARM64 and RISCV64.

Can you give us a little more time to think about this before spending more time on this?

OK.
In fact, I had do more tests during my local runtime. I think this is OK for LA64, arm64 and riscv64.

shushanhf · 2024-03-19T07:44:20Z

Just modified the Compiler::lvaAssignVirtualFrameOffsetToArg within the UNIX_AMD64_ABI.
Later should modify the other Compiler::lvaAssignVirtualFrameOffsetToArg for the stack's arg-offset.

Besides, I should confirm whehter the TARGET_XARCH is always using the frame pointer ?

shushanhf · 2024-04-09T10:46:58Z

@jakobbotsch Could you please review this PR?

Hi, @jakobbotsch
The riscv64 is ok within CI.
Now these failed are related sending Helix failed.
Could you review this PR ?

jakobbotsch · 2024-04-09T10:58:05Z

The failures look related to this PR. For example:
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-99278-merge-f1c2fd6bc6074f53a3/Regression_2/1/console.6de85eb2.log?helixlogtype=result

Assert failure(PID 23 [0x00000017], Thread: 23 [0x0017]): Assertion failed 'frameSize < -2040' in 'Runtime_66089:TestEntryPoint()' during 'Generate code' (IL size 441; hash 0x76f50dc2; FullOpts)

    File: /__w/1/s/src/coreclr/jit/codegenarm64.cpp:1227
    Image: /root/helix/work/correlation/corerun

Regardless, as we mentioned above, we are still not sure that we want to accept this PR. From what I see this PR makes it harder for us to implement future optimizations like #35274. Making some backends work differently to other backends also makes maintenance harder for us in the future. It is important for us to keep the backends as uniform as possible so that we can make changes that work the same way for all targets, without having to keep several different models in mind. What exactly is the justification for this change?

shushanhf · 2024-04-09T11:20:03Z

The failures look related to this PR. For example: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-99278-merge-f1c2fd6bc6074f53a3/Regression_2/1/console.6de85eb2.log?helixlogtype=result
Assert failure(PID 23 [0x00000017], Thread: 23 [0x0017]): Assertion failed 'frameSize < -2040' in 'Runtime_66089:TestEntryPoint()' during 'Generate code' (IL size 441; hash 0x76f50dc2; FullOpts)

    File: /__w/1/s/src/coreclr/jit/codegenarm64.cpp:1227
    Image: /root/helix/work/correlation/corerun

I can fix this.

Regardless, as we mentioned above, we are still not sure that we want to accept this PR. From what I see this PR makes it harder for us to implement future optimizations like #35274.

(1) This PR is also to optimization.
(2) We can optimize the code base on this PR. And the future optimization is not only #35274.
The optimizations are one step by step. This PR is also the one step, I can and will push other optimization on future.
This PR and #35274 and future's optimization are not conflicting.

Making some backends work differently to other backends also makes maintenance harder for us in the future. It is important for us to keep the backends as uniform as possible so that we can make changes that work the same way for all targets, without having to keep several different models in mind.

This PR is including LoongArch64, RISCV64 and ARM64 and uniforms some places. I think this PR shared more codes than before.

What exactly is the justification for this change?

As this PR's description above,
The stackOffsets will be finished with the lvaAssignVirtualFrameOffsetsToLocals,
so there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.
Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

Especially for ARM64, the CodeGen::genPushCalleeSavedRegisters will be significantly simplified.

#99278 (comment)

Of course, I can update the docs about the frame layout and if needed I can add some might next steps for future optimization.

jakobbotsch · 2024-04-09T11:25:42Z

So this PR is primarily meant as a throughput optimization? Do you have numbers on how much this improves throughput?
If I had to guess this two pass computation of stack local offsets is very cheap already compared to everything that is happening. And as mentioned above, the two pass computation is a fundamental requirement for any platform where we might want to not have frame pointers.

shushanhf · 2024-04-09T11:33:38Z

So this PR is primarily meant as a throughput optimization? Do you have numbers on how much this improves throughput?

This PR optimize now's two steps to one step. I didn't rewrite this part and the optimization is very noticeable.
Of course, I can add some testing results by performance tests ?

And as mentioned above, the two pass computation is a fundamental requirement for any platform where we might want to not have frame pointers.

For the case which not used frame pointer, this PR is the first PR and just optimize two steps to one step because now the ARM64/LoongArch64/RISCV64 are always using FP.

For optimization of not used frame pointer for ARM64/LoongArch64/RISCV64 is the next step and that is not conlicting with this PR, and can also implement based on this PR by a seperate PR.

jakobbotsch · 2024-04-09T11:43:36Z

Of course, I can add some testing results by performance tests ?

We have ASM diffs and throughput measurements automatically run for arm64 in CI. You can see them by looking at "runtime-coreclr superpmi-diffs", clicking "View more details on Azure Pipelines" and then opening the "Extensions" tab.
Currently this PR has extremely large regressions on arm64: https://dev.azure.com/dnceng-public/public/_build/results?buildId=635657&view=ms.vss-build-web.run-extensions-tab
The regressions are both in codegen and throughput.

Example codegen regression:

Throughput regression:

For the case which not used frame pointer, this PR is the first PR and just optimize two steps to one step because now the ARM64/LoongArch64/RISCV64 are always using FP.

As you say arm64 always uses FP today. #35274 is about optimizing it so that leaf methods do not use frame pointers. There are currently issues around EE suspension, but I know that @filipnavara has been looking into this. I also would expect LA64 and RISCV64 to be able to use the same optimization if we manage to solve the EE suspension issue. x64/x86 already implement this optimization.

I like the LOC diff of this change -- +678/-1913 -- but it is hard to say anything about that given that it regresses arm64 by a lot.

shushanhf · 2024-04-09T11:55:08Z

Of course, I can add some testing results by performance tests ?

We have ASM diffs and throughput measurements automatically run for arm64 in CI. You can see them by looking at "runtime-coreclr superpmi-diffs", clicking "View more details on Azure Pipelines" and then opening the "Extensions" tab. Currently this PR has extremely large regressions on arm64: https://dev.azure.com/dnceng-public/public/_build/results?buildId=635657&view=ms.vss-build-web.run-extensions-tab The regressions are both in codegen and throughput.

Example codegen regression:

Throughput regression:

For the case which not used frame pointer, this PR is the first PR and just optimize two steps to one step because now the ARM64/LoongArch64/RISCV64 are always using FP.

As you say arm64 always uses FP today. #35274 is about optimizing it so that leaf methods do not use frame pointers. There are currently issues around EE suspension, but I know that @filipnavara has been looking into this. I also would expect LA64 and RISCV64 to be able to use the same optimization if we manage to solve the EE suspension issue. x64/x86 already implement this optimization.

I like the LOC diff of this change -- +678/-1913 -- but it is hard to say anything about that given that it regresses arm64 by a lot.

If just for ARM64 within this PR, I agree that this PR will increase the Prolog/Epilog's size because this PR doesn't use liking the STP fp,lr, [SP, imm!], although this PR also significantly simplified the calleeSavedRegs's code within Codegen.
BTW, although this can optimize further. I agree with you that ARM64's optimization should be done by alone.

If you can accept this PR only for RISCV64 and LoongArch64 but not ARM64, I can revert for ARM64.
The optimization of not useing FP for LoongArch64 and RISCV64 is future optimization.
And the future optimization by other ways for ARM64/LA64/RISCV64 should be done by other PRs.
In fact, I also have some other ideas to further opitimizations after this PR.

jakobbotsch · 2024-04-09T12:04:49Z

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR, but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

shushanhf · 2024-04-09T12:08:42Z

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR,

OK.
First, I just push LA64/RISCV64 within this PR.

but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

I think this is not conflicting with future. we can optimize step by step and we can uniform these in future and I believe we can.

shushanhf · 2024-04-10T01:17:53Z

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR,

OK. First, I just push LA64/RISCV64 within this PR.

but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

I think this is not conflicting with future. we can optimize step by step and we can uniform these in future and I beleive we can.

Hi, @jakobbotsch
I'm very honor to be a contributor of dotnet and very happy learning so much from you and your team!
The advices from you and the dotnet team's are quite useful for me.
As we have a common purpose to make the dotnet better and better, I believe we can resolve different problem/issues step by step.

This PR is the first PR and I will push a series of PRs on future to amend related code for ARMARCH/XARCH/LA64/RISCV64, some ideas or PRs are not pushed within this PR.
That is this PR is not a lonely PR.
But don't worry, I will don't increase the ARM64's Prolog/Epilog's asm size and will not disturb the runtime-team to optimize the leaf functions's FP #35274, I can wait to modify ARM64 unitil you finish liking #35274. Of course it will be suite for the leaf function's optimization for not using the FP.
I will uniform most for ARMARCH/XARCH/LA64/RISCV64 to ease the long term burden for different implementations.
All these PRs will be pushed one by one and each one is a specific issue.

avoid the redundant computation of the stackOffsets. After the `lvaAssignVirtualFrameOffsetsToLocals`, there is no need to recompute the LclVarDsc's StackOffset within the `lvaFixVirtualFrameOffsets`.

shushanhf · 2024-04-12T00:48:58Z

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR,

OK. First, I just push LA64/RISCV64 within this PR.

but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

I think this is not conflicting with future. we can optimize step by step and we can uniform these in future and I beleive we can.

Hi, @jakobbotsch I'm very honor to be a contributor of dotnet and very happy learning so much from you and your team! The advices from you and the dotnet team's are quite useful for me. As we have a common purpose to make the dotnet better and better, I believe we can resolve different problem/issues step by step.

This PR is the first PR and I will push a series of PRs on future to amend related code for ARMARCH/XARCH/LA64/RISCV64, some ideas or PRs are not pushed within this PR. That is this PR is not a lonely PR. But don't worry, I will don't increase the ARM64's Prolog/Epilog's asm size and will not disturb the runtime-team to optimize the leaf functions's FP #35274, I can wait to modify ARM64 unitil you finish liking #35274. Of course it will be suite for the leaf function's optimization for not using the FP. I will uniform most for ARMARCH/XARCH/LA64/RISCV64 to ease the long term burden for different implementations. All these PRs will be pushed one by one and each one is a specific issue.

Hi, @jakobbotsch
Could you please review this PR and give me your advices ?
Thanks

jakobbotsch · 2024-04-12T08:59:11Z

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends.
Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

shushanhf · 2024-04-12T09:09:10Z

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks.
What we are controversial is void Compiler::lvaFixVirtualFrameOffsets().
How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?

I want to delay this PR's optimization to future.
I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0.
The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.

I want to merge this PR as you can accept way because I have some other PRs to push after this PR.
Maybe the best way is suspending judgment and continue other PRs.

shushanhf · 2024-04-12T09:40:25Z

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks. What we are controversial is void Compiler::lvaFixVirtualFrameOffsets(). How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?

I want to delay this PR's optimization to future. I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0. The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.

I want to merge this PR as you can accept way because I have some other PRs to push after this PR. Maybe the best way is suspending judgment and continue other PRs.

How about I keep this PR and create a new PR which I just modify the frame layout without modifying the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets().

jakobbotsch · 2024-04-12T10:07:30Z

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks. What we are controversial is void Compiler::lvaFixVirtualFrameOffsets(). How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?
I want to delay this PR's optimization to future. I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0. The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.
I want to merge this PR as you can accept way because I have some other PRs to push after this PR. Maybe the best way is suspending judgment and continue other PRs.

How about I keep this PR and create a new PR which I just modify the frame layout without modifying the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets().

The title of this PR implies it is purely an optimization. But are you saying that actually this PR is fundamentally changing stack layout, and that's the change you want to make?

There are constraints on the exact expected frame layout from several sources I know of: unwinding, ETW on Windows, EnC support and pinvokes. I'm sure there are constraints from other sources I'm not familiar with. I would not be very comfortable taking fundamental changes to frame layout without doing a lot of due diligence.

For example: https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=msvc-170#arm64-stack-frame-layout

shushanhf · 2024-04-12T10:12:50Z

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks. What we are controversial is void Compiler::lvaFixVirtualFrameOffsets(). How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?
I want to delay this PR's optimization to future. I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0. The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.
I want to merge this PR as you can accept way because I have some other PRs to push after this PR. Maybe the best way is suspending judgment and continue other PRs.

How about I keep this PR and create a new PR which I just modify the frame layout without modifying the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets().

The title of this PR implies it is purely an optimization. But are you saying that actually this PR is fundamentally changing stack layout, and that's the change you want to make?

There are constraints on the exact expected frame layout from several sources I know of: unwinding, ETW on Windows, EnC support and pinvokes. I'm sure there are constraints from other sources I'm not familiar with. I would not be very comfortable taking fundamental changes to frame layout without doing a lot of due diligence.

OK.
Thanks, I think I just suspend this PR now until Bruce back.
The other thinks I create a new PR to discuss.

JulieLeeMSFT · 2024-08-12T17:18:35Z

We don't have time to work on this for .NET 9, so we will review it in .NET 10.
If you believe this should be done in .NET 9, please ping me. We will reconsider.

shushanhf · 2024-08-13T00:33:45Z

I will close this PR.
If needed, I will do it based on next runtime release version.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 5, 2024

shushanhf added the arch-loongarch64 label Mar 5, 2024

shushanhf force-pushed the LA_frameLayout branch from 7d09326 to 46e29e5 Compare March 5, 2024 07:44

build-analysis bot mentioned this pull request Mar 5, 2024

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

shushanhf force-pushed the LA_frameLayout branch from 46e29e5 to 1a4347a Compare March 8, 2024 09:20

build-analysis bot mentioned this pull request Mar 8, 2024

System.Memory.Tests - segfault on coreclr Ubuntu/x64 #95103

Closed

shushanhf removed the arch-loongarch64 label Mar 13, 2024

shushanhf changed the title ~~[LoongArch64] redesign the frame layout to avoid the redundant computation of the stackOffsets~~ Redesign the frame layout to avoid the redundant computation of the stackOffsets Mar 13, 2024

shushanhf force-pushed the LA_frameLayout branch 4 times, most recently from 3841fdb to 68831a7 Compare March 14, 2024 04:37

build-analysis bot mentioned this pull request Mar 14, 2024

System.DirectoryServices.Protocols.Tests.BerConverterTests.Decode_Bytes_ReturnsExpected failing on Windows x86 #99725

Closed

shushanhf force-pushed the LA_frameLayout branch 3 times, most recently from dfb6b99 to 9deea5f Compare March 19, 2024 07:34

shushanhf force-pushed the LA_frameLayout branch from 9deea5f to 6ca44dd Compare March 19, 2024 07:56

build-analysis bot mentioned this pull request Apr 8, 2024

[NativeAOT/macos-arm64] System.Runtime.Intrinsics.Tests fail with exit code 138 #100175

Closed

shushanhf force-pushed the LA_frameLayout branch 2 times, most recently from 6d0fc84 to 1d3d085 Compare April 9, 2024 00:49

build-analysis bot mentioned this pull request Apr 9, 2024

LibraryImportGenerator.Unit.Tests crashing on linux-x64 mono interpreter #100800

Open

shushanhf force-pushed the LA_frameLayout branch 2 times, most recently from 07586f8 to b812c5f Compare April 10, 2024 00:54

Redesign the frame layout for LoongArch64/RISCV64 to

d5ec274

avoid the redundant computation of the stackOffsets. After the `lvaAssignVirtualFrameOffsetsToLocals`, there is no need to recompute the LclVarDsc's StackOffset within the `lvaFixVirtualFrameOffsets`.

shushanhf force-pushed the LA_frameLayout branch from b812c5f to d5ec274 Compare April 10, 2024 06:14

JulieLeeMSFT added this to the 10.0.0 milestone Aug 12, 2024

JulieLeeMSFT assigned shushanhf Aug 12, 2024

shushanhf closed this Aug 13, 2024

github-actions bot locked and limited conversation to collaborators Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign the frame layout to avoid the redundant computation of the stackOffsets #99278

Redesign the frame layout to avoid the redundant computation of the stackOffsets #99278

shushanhf commented Mar 5, 2024 •

edited

Loading

ghost commented Mar 5, 2024

jakobbotsch commented Mar 5, 2024

shushanhf commented Mar 5, 2024

jakobbotsch commented Mar 5, 2024

shushanhf commented Mar 5, 2024 •

edited

Loading

BruceForstall commented Mar 7, 2024

shushanhf commented Mar 7, 2024

shushanhf commented Mar 7, 2024 •

edited

Loading

shushanhf commented Mar 11, 2024

shushanhf commented Mar 13, 2024

BruceForstall commented Mar 13, 2024

shushanhf commented Mar 13, 2024

shushanhf commented Mar 19, 2024 •

edited

Loading

shushanhf commented Apr 9, 2024

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 •

edited

Loading

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 •

edited

Loading

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 •

edited

Loading

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 •

edited

Loading

shushanhf commented Apr 10, 2024 •

edited

Loading

shushanhf commented Apr 12, 2024 •

edited

Loading

jakobbotsch commented Apr 12, 2024

shushanhf commented Apr 12, 2024 •

edited

Loading

shushanhf commented Apr 12, 2024

jakobbotsch commented Apr 12, 2024 •

edited

Loading

shushanhf commented Apr 12, 2024

JulieLeeMSFT commented Aug 12, 2024

shushanhf commented Aug 13, 2024

Redesign the frame layout to avoid the redundant computation of the stackOffsets #99278

Redesign the frame layout to avoid the redundant computation of the stackOffsets #99278

Conversation

shushanhf commented Mar 5, 2024 • edited Loading

ghost commented Mar 5, 2024

jakobbotsch commented Mar 5, 2024

shushanhf commented Mar 5, 2024

jakobbotsch commented Mar 5, 2024

shushanhf commented Mar 5, 2024 • edited Loading

BruceForstall commented Mar 7, 2024

shushanhf commented Mar 7, 2024

shushanhf commented Mar 7, 2024 • edited Loading

shushanhf commented Mar 11, 2024

shushanhf commented Mar 13, 2024

BruceForstall commented Mar 13, 2024

shushanhf commented Mar 13, 2024

shushanhf commented Mar 19, 2024 • edited Loading

shushanhf commented Apr 9, 2024

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 • edited Loading

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 • edited Loading

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 • edited Loading

jakobbotsch commented Apr 9, 2024

shushanhf commented Apr 9, 2024 • edited Loading

shushanhf commented Apr 10, 2024 • edited Loading

shushanhf commented Apr 12, 2024 • edited Loading

jakobbotsch commented Apr 12, 2024

shushanhf commented Apr 12, 2024 • edited Loading

shushanhf commented Apr 12, 2024

jakobbotsch commented Apr 12, 2024 • edited Loading

shushanhf commented Apr 12, 2024

JulieLeeMSFT commented Aug 12, 2024

shushanhf commented Aug 13, 2024

shushanhf commented Mar 5, 2024 •

edited

Loading

shushanhf commented Mar 5, 2024 •

edited

Loading

shushanhf commented Mar 7, 2024 •

edited

Loading

shushanhf commented Mar 19, 2024 •

edited

Loading

shushanhf commented Apr 9, 2024 •

edited

Loading

shushanhf commented Apr 9, 2024 •

edited

Loading

shushanhf commented Apr 9, 2024 •

edited

Loading

shushanhf commented Apr 9, 2024 •

edited

Loading

shushanhf commented Apr 10, 2024 •

edited

Loading

shushanhf commented Apr 12, 2024 •

edited

Loading

shushanhf commented Apr 12, 2024 •

edited

Loading

jakobbotsch commented Apr 12, 2024 •

edited

Loading