-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework cached compilation; remove invalidation generator #445
Conversation
Found the issue: The cache that CUDA provides is itself keyed on the context, which ensures that after a We could probably do something better, but I'm not a fan of adding yet another interface like |
Surprisingly, this gets rid of some of the remaining allocations in Before:
After:
|
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #445 +/- ##
==========================================
- Coverage 85.71% 78.63% -7.08%
==========================================
Files 24 23 -1
Lines 2962 2926 -36
==========================================
- Hits 2539 2301 -238
- Misses 423 625 +202
☔ View full report in Codecov by Sentry. |
Yes, Base does not store that Tuple for the reason you found. It uses an iterated lookup instead, with the first level keyed by mi, and then a linear scan (usually just one entry) of all of the possibilities for that |
So that the client can wipe the cache.
This PR changes how we do cached compilation. Before, we looked into a small cache indexed by the codegen world age we got from a hacky generator. As @vtjnash said, that world age isn't valid and shouldn't leak into runtime code, so I redesigned the cached compilation here to more resemble what Base does. We now store compiled and linked objects 'next' to the CodeInfos (just like how Base stores pointers inside the CI). At run time, we still use a small cache but it's indexed by the current TLS world age. When the world changes, that may have happened due to an unrelated method redefinition, so we query the CI cache (intersecting world ages) and look up the GPU object that's stored next to it.
@vchuravy This broke LazyCodegen. I'm not sure why it even relied on the codegen world age, as it doesn't do any invalidation-related tests.
@wsmoses I know Enzyme relies on this, sorry. Feel free to copy the old code there.
Fixes #435, #440, #146