Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow vectorization of ternaries, indices #870

Open
nkoskelo opened this issue Oct 14, 2024 · 1 comment · May be fixed by #921
Open

Allow vectorization of ternaries, indices #870

nkoskelo opened this issue Oct 14, 2024 · 1 comment · May be fixed by #921

Comments

@nkoskelo
Copy link
Contributor

nkoskelo commented Oct 14, 2024

knl = lp.make_kernel( " { [i] : 0 <= i  < 8}"," out[i] = a if i == 0 else (b if i == 1 else c)")
knl = lp.tag_inames(knl, {"i": "vec"})
from loopy.kernel.array import VectorArrayDimTag
knl = lp.tag_array_axes(knl, "out", [VectorArrayDimTag()])
knl = lp.add_and_infer_dtypes(knl, {"a": np.float32, "b": np.float32, "c": np.float32})
lp.generate_code_v2(knl).device_code()

fails due to there being a dependence on i within the store operation due to the presence of an if statement.

Removing the tag_array_axes call and trying again will result in a warning but device code will be generated.

knl = lp.make_kernel( " { [i] : 0 <= i  < 8}"," out[i] = a if i == 0 else (b if i == 1 else c)")

knl = lp.tag_inames(knl, {"i": "vec"})
from loopy.kernel.array import VectorArrayDimTag
lp.generate_code_v2(knl).device_code()

will generate the following device code:

__kernel void __attribute__  (( reqn_work_group_size(1,1,1))) loopy_kernel(float const a, float const b, float const c, __global float *__restrict__ out)
{
out[0] = ((0 == 0) ? a : ((0 == 1) ? b : c));
out[1] = ((1 == 0) ? a : ((1 == 1) ? b : c));
out[2] = ((2 == 0) ? a : ((2 == 1) ? b : c));
out[3] = ((3 == 0) ? a : ((3 == 1) ? b : c));
out[4] = ((4 == 0) ? a : ((4 == 1) ? b : c));
out[5] = ((5 == 0) ? a : ((5 == 1) ? b : c));
out[6] = ((6 == 0) ? a : ((6 == 1) ? b : c));
out[7] = ((7 == 0) ? a : ((7 == 1) ? b : c));
}
@inducer inducer changed the title Vectorize only partially simplifies expressions Fix vectorization of ternaries Jan 16, 2025
@inducer inducer changed the title Fix vectorization of ternaries Allow vectorization of ternaries Jan 16, 2025
@inducer
Copy link
Owner

inducer commented Jan 16, 2025

As discussed a few times, while the lack of simplification is a potential problem, I think it's probably more useful to allow this to vectorize. There are two missing pieces AFAICT:

  • The ternary operator cleanly vectorizes according to the OpenCL spec. Loopy just needs to be told that this is so.
  • The loop iname for a vec loop currently isn't available in vector form. But this is straightforward, it's just a constant that needs to be generated:
    const int8 lp_idx = (int8) (0, 1, 2, 3, 4, 5, 6, 7);

@inducer inducer changed the title Allow vectorization of ternaries Allow vectorization of ternaries, indices Jan 16, 2025
@nkoskelo nkoskelo linked a pull request Feb 19, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@inducer @nkoskelo and others