-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frame/LV handling changes break backwards compatibility #508
Comments
The pass I made was specifically for making injectors more consistent given the incredibly inconsistent ways that frames can turn up in the bytecode. All this state-machine nonsense is only necessary because the frames (especially when generated by ASM) don't tell the whole story, and every iteration on Locals is attempting to get closer to what the user expects. Locals are by nature a brittle thing and every change is attempting to reduce that brittleness. Can I just say that while I appreciate the issue, your timing sucks as I've kept these changes in snapshots trying to get people to find regressions for ages because I knew the changes would result in some different results (that's the intention) but there shouldn't have been any broken ones. I literally promoted this to release on friday and your issue was opened the next day! Anyway. I note that the examples you gave are both fabric/quilt mods. The changes made were actually made specifically to address some situations in fabric where locals that should still have been in scope from the user's point of view were being trimmed too early, and where ASM's frame calculations aren't reliable enough. The expected result is that some things previously cut from the frame but intuitively in-scope should now be included, at the expense of some things that were "leaking" previously potentially being removed. Unfortunately testing with fabric is a massive pain at the moment, as the hacky way to get it to let me use my development version of mixin with a fabric mod causes it to not load a bunch of their mixins and therefore the game doesn't run. I've introduced a new method of obtaining the transformer in 0.8.3 so I'm hoping this will get simpler soon. In the mean time if you have any MCP versions of the same regressions that would help in troubleshooting this.
Either I'm reading this wrong or there's some part of this puzzle that I'm missing. The target method looks like this: protected void applyDamage(DamageSource source, float amount) {
if (!this.isInvulnerableTo(source)) {
amount = this.applyArmorToDamage(source, amount);
amount = this.applyEnchantmentsToDamage(source, amount);
float f = amount;
amount = Math.max(amount - this.getAbsorptionAmount(), 0.0F);
this.setAbsorptionAmount(this.getAbsorptionAmount() - (f - amount));
float g = f - amount;
if (g > 0.0F && g < 3.4028235E37F) {
this.increaseStat(Stats.DAMAGE_ABSORBED, Math.round(g * 10.0F));
}
if (amount != 0.0F) {
this.addExhaustion(source.getExhaustion());
float h = this.getHealth();
this.setHealth(this.getHealth() - amount);
this.getDamageTracker().onDamage(source, h, amount);
if (amount < 3.4028235E37F) {
this.increaseStat(Stats.DAMAGE_TAKEN, Math.round(amount * 10.0F));
}
}
}
} and I think it's intuitive to say that the floats The choice of implicit discriminator in a method which definitely contains other floats could have been prevented by the mixin author specifying
Probably, though the stretchiness of the frame is embedded in the heuristic
Again, I think we're looking at different bytecode because the one I see (in my available Fabric 1.16.5 workspace) is F_SAME1 but the point still stands. This is a better example of something which is probably a regression (I'd argue that the previous example is actually an improvement since it allows capturing of variables which are intuitively in scope). This kind of situation where something is lopped out of the frame by the stack map itself but still intuitively in-scope is what the update was meant to improve, and neatly demonstrates what
Mostly this is only an issue if the variable hasn't been initialised, as you say COMPUTE_FRAMES does handle the rest quite gracefully. The intention, as mentioned, is that "resurrected" LVs are the ones which look like they're in scope but the VM has cropped them from the frame because they're not used any more, rather than them being actually out of scope.
I don't really know what to tell you, as I said I left this in SNAPSHOT for absolutely ages on purpose so that people could test for regressions. Perhaps encourage fabric/quilt to have a snapshot/bleeding branch which pulls my snapshots more frequently so that people who want to test things out can report them upstream. On the one hand there's always pressure to move more quickly, and on the other I want this to be a stable library people can rely on and with something as brittle as locals are it's just impossible to find a balance that will suit everyone. Locals have improved a lot in recent versions and I think people would rather I make moves toward greater consistency overall than just throw up my hands and say "it's broken and as good as it's gonna get" and leave it at that. This gets more relevant in 0.10 because that will support local capture in all injectors.
I think I've answered this above but the goals for locals are:
The contract of locals is always going to be "best effort" because without going full "analytical decompiler" it's just impossible to come up with something which is both reasonably efficient and reasonably consistent. Note "reasonably" because I don't have any delusions that it's ever gonna be perfect. If fabric/quilt want to ignore my upstream changes or at least commute them to major version bumps (when people expect to have to update anyway) then they can do that. The external contract of I don't have all the answers and I've never claimed to, I just try to make this stuff work the best it can. The fabric peeps all know where I am on discord if they want to discuss these things, the sponge channel is platform-agnostic and just for discussing mixin in general.
I don't have a good answer either. About the only thing I can think is adding some kind of versioning to the locals detection to allow mixin authors to choose (or at least pin support for) a specific version of locals capture (or maybe just a general purpose selection of internal APIs) so that it's possible to have improvements as they come for those that choose it, and pinned stability for the others. Though the general ambivalence towards the versioning mechanisms that I do provide makes me think nobody would use that anyway! Local variables were a mistake, we should never have moved out of the caves, life was simpler when being eaten by a bear was your no1 worry. |
Thanks for the reply, I added some comments inline.
Sorry, we were (and somewhat still are) caught up in some major mod loading and build changes, your release happened to match the time when we got it sufficiently tied together again..
I have an initial adoption patch at FabricMC/fabric-loader#468 - always looking for input to make testing this easier.
You are right, sorry about the confusion. I've been misreading bytecode viewer's output which shows the same as F_FULL, labeling it F_NEW instead.. Textifier shows more directly that it never added the extra locals to the frames (F_SAME everywhere), this doesn't change the my behavior analysis though (applies to both examples)
There are at least the following problems:
The last Mixin change appears to tackle 1., but I don't see how the approach could possibly cope with the fallout of 2.-4. I believe there is no way around doing the same kind of basic block analysis and merging as asm's frame computation pass to solve 2.+3., repeatedly until quiescence for 4. The original frames would be ignored. This should yield results as perfect as possible, bugs and recompilation changes aside. The asm implementation doesn't prune as far as I'm aware, so it should be suitable in that regard and looks like a strict superset of the needed functionality...
The current idea for Fabric is to record the Mixin version the mod was built (=hopefully tested) against, which should allow selecting the appropriate behavior with some nasty patches, upstream support would make this much better of course. Fabric Loader isn't bound to a specific Minecraft version, there is no natural breakage point. Users are supposed to always use the latest build for whichever MC release they want to play, mods that don't rely on internals keep working.
I'm not aware of any mechanism that's meant to make a fixed host Mixin version more compliant with a specific mod Mixin use. I heard some controversy about the Java compat stuff on the sidelines, however to the extent of my knowledge I'm afraid to say that this mechanism is redundant with Fabric Loader's Java version dependencies and provides inferior UX in the failure case. Mods can't supply their own Mixin library, so constraints towards the mod loader shipping it should cover all needs. They also benefit of all the dependency and known failure case handling infrastructure. |
Guess I should have checked the username before replying.
C'est la vie. It would have been nice to detect the regression earlier but maybe that can be addressed going forward somehow.
What I really want is an easy way for fabric/quilt users to quickly drop-in upstream for testing and reproduction of bugs. From what I understand there's an API part which uses fabric-specific extensions of mixin but the main loader part doesn't, so it should be possible to test this kind of stuff. Don't want to go off-topic for this issue but one thing I've mentioned in discord as well is that instead of patching injection points for different behaviour, the behaviour desired can be achieved by subclassing and registering the subclasses since
This is where I disagree though. I think the first case is a legitimate improvement and breaks a badly-written (or at very least not defensively written) injection on the part of the mod. The second case I agree is a regression and as I said I'll look into that now I have a good case to study. The frames as emitted by ASM do not have sufficient detail to provide reliable local capture, thus the whole state machine that is
You're right, the changes are the answer to 1. but the answer to 2-4 is that the developer exists in the loop. Injectors don't exist in a vacuum and the number of code paths which can potentially lead to disaster are limited to ones where a variable which was previously uninitialised lives beyond a join point with a jump from before it was initialised, which the verifier catches early anyway. Having better (at least more intuitive) work-around for aggressively pruned locals seemed a good trade for the benefit of being able to more effectively capture more things, especially as it's not a pure widening but a heuristic one. The locals do still drop out of scope in the state machine, and the contract of things like CHOP is propagated forward just the locals are allowed to linger in the known locals table for as long as they can reasonably be assumed to be valid. Because of the main consumer of locals being The "leakiness" was never intended, but what I've tried to do is make it more deterministic where it makes sense. I should think that knowing my slavish attempts at backward compatibility you'd realise that even when I "fix" something I want to preserve backward compat as much as possible. Locals is just one of those places where there's a delicate balance to be struck and you're never going to keep everyone happy.
But this is just hand-wavey. The problem with choosing the minimum viable answer is that it's unintuitive because something which looks like it should be in scope, and in fact can be captured, gets culled by this approach. Locals is not stupid, it's taking data points from:
The main situations I've seen it explode are just when attempting to capture something that the VM knows to be uninitialised, and that's still inhibited with the current approach, at least in ways that would show up unexpectedly vs. immediately failing at dev time.
As I mentioned, the only thing I can think of is gating each algorithm behind some kind of (possibly internal rather than user-facing) version switch that downstream could then enable or disable. I don't see any other way to handle it sensibly.
Then like it or not you've created an implicit contract of "things will break sometimes" unless you put additional scaffolding in place to specifically address cases where behaviour changes. I've already agreed that one change I consider a bug and will fix, but the other case where what I consider a bug is fixed leads to a poorly-written injector breaking is part of the nature of using locals. If fabric wants a more basic locals computation gated behind a global switch or something then you still have the same problem, because then it's not even possible to fix bugs (and there have been plenty with Locals) without breaking that contract.
It wasn't controversy so much as people just misunderstanding the nature of the compatibility level provisions in Mixin and what they were for. I put compatibility level selection in mixin precisely to provide a moving window of known-supported language features, see my detailed explanation in the release notes of 0.8.3 in order to provide some guarantees that a given Runtime+ASM+Mixin combo can support a required feature set. In your case it seems like you just already have an analogue in terms of targetting loader versions, but that means those guarantees can be enforced elsewhere.
My point was that I provide checks and balances in Mixin anyway but people just want to select the highest level and not think about it any further, which defeats the purpose of those checks. So adding more version semaphores just seems like a fool's errand. Not necessarily relevant in fabric land but more a commentary on people's willingness to engage with tunables that I do already support. |
I think we're talking about different aspects.. I haven't said anything against exposing extra locals and very little about leaking too much, only that the last change isn't backwards compatible (example 1 breaks) and suggested scope widening myself. We definitely wouldn't use a more restrictive locals selection in general, but attempt to selectively reapply the restrictive computation (or any other version - buggy or not) for the mods that presumably depend on it. This is very much in line with your suggestion on how to handle it. The numbered points reflect on shortcomings of both the current and previous LV computation implementations. For example the linear scan addressed by 2. will miss locals that were introduced before a control flow back edge but aren't referenced anymore. From my experience these back edges might be too common to ignore and be particularly annoying with the recompilation weaknesses in 3. It looks like you are answering to the use of ClassReader's frame expansion, which is not what I meant. I'm talking about the algorithm and implementation ASM is using in ClassWriter to recreate frames from scratch and that isn't exposed. ClassWriter's computation should yield the maximum set of accessible locals contrary to mere frame expansion. If it doesn't there is some pruning step I've missed when skipping over its code. It also implements the repetition until quiescence scheme, so this isn't anything absurd or what you aren't doing already by using ClassWriter with COMPUTE_FRAMES. For a quick investigation comparing the results of |
I think we're potentially discussing at cross purposes yes but I understand the core point you're making.
It isn't backward compatible but only in that I consider the previous behaviour a bug, and could have been defensively coded against by using
Yeah, this requires separating out the differences in logic but despite it taking forever to actually engineer, the code changes weren't that great so it's likely this could be done with minimal pain, either in your fork or as a general contract of behaviour for upstream, I need to think about this.
I'm not sure this is as big a problem as you think it is though, I need some empirical examples to really make this idea stick. At present I think the balance is struck at about the right place in terms of widening.
No I'm fully aware of what you meant by COMPUTE_FRAMES. The fact that this is being run is exactly why it's even possible to extend the lifespan of something beyond where the original frames are. My points in turn were about computation of frames before injection and reliably extending lifespan in a way that passes validation precisely because we can trust that COMPUTE_FRAMES will then work out the bounds on the trailing edge of the pipeline.
Right sorry. I thought you were advocating for doing that pass up front when computing what transformation was possible, which is where the confusion arose. A lot of changes for 0.8 were to make the Locals state machine more reliable when compressed frames were in use, and consistent results regardless of whether EXPAND_FRAMES was used on the reader (because ModLauncher doesn't). This had the side-effect of just making locals computation much better anyway, which is why 0.8 had much more stable locals than the previous version.
I'm not sure it'd help much, I trust COMPUTE_FRAMES way more than I trust EXPAND_FRAMES because they do very different jobs and a lot of the pain points from previous iterations of this over the last 5~6 years have been to do with ASM doing a crap job of EXPAND_FRAMES. Hence the inclusion of the comparison against the compressed frames when possible.
Right but that's always operating on my post-transformed code, where working out what can realistically be considered in scope is the job of |
I'm considering this fixed as much as I think it's possible to fix as of the 0.8.4 release. While this still represents a small regression, this is - I believe - within the scope of acceptable changes for a subsytem which by nature can have unreliable results, and the move towards reliability is an acceptable trade-off in the greater scheme of things. Something, something broken eggs making omelette. Considering this resolved unless there are further objections. |
I agree, thanks! |
I'm running into some issues where (as far as I can tell) 6f5dce1#diff-4bd581bd3412aebda2607dc2b9421fe52b2788cc301ca7049a3bcfeb1b1d035cL203 breaks some LVT sensitive uses such as https://github.com/MoriyaShiine/bewitchment/blob/1.16/src/main/java/moriyashiine/bewitchment/mixin/transformation/PlayerEntityMixin.java#L261 .
This example target method features a float in the descriptor, two float variables before the mixin at-location and a F_NEW frame between the variables and the at-location:
FSTORE FSTORE F_NEW <at-loc>
. The F_NEW frame moves those variables out of (JVM) scope by resetting the locals to only the method parameters. Older Mixin versions use the variables defined in the frame, where the only float is the method parameter, latest Mixin appears to include the two floats that went out of scope.The example case breaks implicit LV selection (1 float -> 3 floats = ambiguous = InvalidImplicitDiscriminatorException), explicit specification with ordinal could easily start referencing the wrong LV slot in other cases. From my reading of the Mixin code it would treat F_CHOP differently from F_NEW even if both resulted in the same frame state.
Now since this is almost too easy, another case at https://github.com/Juuxel/Adorn/blob/1.16.5/src/main/java/juuxel/adorn/mixin/BubbleColumnBlockMixin.java#L25 does apparently quite the opposite. The target method features two returns, one with the requests LV in scope, the other without (F_NEW clears all locals before 2nd IRETURN). Old Mixin doesn't care, new Mixin can't find the out of scope local.
Since Mixin uses COMPUTE_FRAMES, it can and does resurrect out of scope LVs (at least some? uninit likely breaks) in some cases.
For a mod loader the problem is quite difficult to tackle, the mods can't be updated reasonably, even less so with retro packs that are otherwise perfectly compatible with the latest loader version. The status quo looks to remain inconsistent as the above examples exhibit opposite behaviors, indicating another incompatible change is in order.
Are LV references supposed to work off the frames, the entirety of initialized LV slots (considering all branches, IMO the best option) or all LV slots including uninitialized (probably invalid)? How should one deal with these incompatibilities (I don't have a good answer there)?
The text was updated successfully, but these errors were encountered: