-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-91719: Make MSVC generate somewhat faster switch code #91718
Conversation
Apparently a switch on an 8-bit quantity where all cases are present generates a more efficient jump (doing only one indexed memory load instead of two). See faster-cpython/ideas#321 (comment)
Would it make more sense to redefine We should probably make |
Yeah, I had considered that, it makes sense. I'll confirm that it has the same effect.
I don't see why -- it's not used in a similar switch AFAICT, and it's not cramped for space in its struct. I assume for most other operations the cost of loading an int and loading a byte is effectively the same, since the CPU has to load a whole cache line (32 or 64 bytes) anyway. |
I don't believe this needs a news blurb. |
The dispatch sequence includes |
Okay, I'll make that change. |
@markshannon, please re-review. I confirmed that the switch still uses a single indirection ( |
Looks good to me |
Oh wow, that's a simple and clever optimization! Great that it helps MSVC to optimize Python on Windows! |
Follow-up fo clean the public API: #91906 |
Apparently a switch on an 8-bit quantity where all cases are
present generates a more efficient jump (doing only one indexed
memory load instead of two).
See faster-cpython/ideas#321 (comment)