-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Altivec bitshuffle #98
Conversation
This is complete and ready for merging (bar some suggestions or comments). @kif can you review? |
I have removed a duplicated |
I am running the benchmark ... |
Yes, I did notice the drop in performance of the VSX version for bitunshuffle for typesize > 1. However, as Blosc2 typically shuffles/unshuffles blocks of 1 MB as maximum, I don't think the drop in performance in this region is too bad. But if for some reason, one absolutely wants better performance for blocks > 1 MB, another possibility is to find a direct replacement for __mm_store_pi and come with a similar algorithm like in master but using VSX. Finally, if getting rid of SSE2 is not deemed absolutely necessary, one may want to go back to the original bshuf_trans_byte_bitrow_altivec. |
@kif Something went wrong in pasting the plot for bitshuffle1_altivec versus bshuf_trans_byte_bitrow_altivec. |
I noticed, but my notebook kernel crashed and I had to relaunch it to get the last image. |
That's pretty cool. I'd say that we want to use the OTOH, I am not sure why |
If I remember well the code is not working in all conditions (there are
additional constrains on the bloc size which has to be a multiple of
16(vector size)*8(bits/item)) which makes it not that easy to use.
|
I see. Well, a possibility is to use the accelerated path just when the condition |
Merging. |
This PR is for:
It is still work in progress.