Parallel Suffix cleanup #102

jamierpond · 2024-09-12T08:26:43Z

Not totally sure how this can be turned into AI right now... i think this function might be too simple for associative iteration?

template<typename S>
constexpr auto parallelSuffix(S input) {
    auto
        log2Count = log2_of_power_of_two(S::NBits),
        power = 1;
    auto
        result = input,
        shiftMask = S{~S::MostSignificantBit};

    for (;;) {
        result = result ^ result.shiftIntraLaneLeft(power, shiftMask);
        if (!--log2Count) { break; }
        shiftMask = shiftMask & S{shiftMask.value() >> power};
        power <<= 1;
    }

    return S{result};
}

jamierpond · 2024-09-12T09:36:16Z

inc/zoo/swar/associative_iteration.h

-        ZTE(power << 1);
+    for (;;) {
+        result = result ^ result.shiftIntraLaneLeft(power, shiftMask);
+        if (!--log2Count) { break; }


@thecppzoo note one condition in the inner loop now

This condition is broken: it does not work with a bit count of non-powers of 2, like 7

thecppzoo · 2024-09-13T02:10:07Z

inc/zoo/swar/associative_iteration.h

+
+template<typename S>
+constexpr auto parallel_suffix(S input) {
+    constexpr auto log2Count = S::Lanes;


This can't be right, the parallel suffix does not depend on the number of lanes, but the number of bits in the lanes

looks like you might be reviewing an outdated version ?

thecppzoo · 2024-09-13T02:18:35Z

This implementation might be simple enough, sure, but it can only accept lane sizes that have a power of two number of bits.
Let's review if the implementation I made is less efficient than yours.
Otherwise, the much harder challenge of supporting any arbitrary bitcount will have to decompose the number of bits into its binary representation to make the groups, and then AI would come to bear more clearly.
In simpler terms, this implementation is like multiplication when the factor is a power of two, much easier.

jamierpond · 2024-09-13T03:15:33Z

@thecppzoo https://godbolt.org/z/5jdfffb1M

jamierpond · 2024-09-13T03:23:18Z

hmmm... yeah i see what you mean...

…

On Thu, 12 Sept 2024 at 19:18, thecppzoo ***@***.***> wrote: This implementation might be simple enough, sure, but it can only accept lane sizes that have a power of two number of bits. Let's review if the implementation I made is less efficient than yours. Otherwise, the much harder challenge of supporting any arbitrary bitcount will have to decompose the number of bits into its binary representation to make the groups, and then AI would come to bear more clearly. In simpler terms, this implementation is like multiplication when the factor is a power of two, much easier. — Reply to this email directly, view it on GitHub <#102 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARQHT3HLE4OYF3LWWQQLWE3ZWJDRBAVCNFSM6AAAAABOCUJ25GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXHEYTCNBXGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

jamierpond · 2024-09-13T03:59:22Z

ok now working with non-power of two nuim bits, just needed to idiot check myself about the log2 impl for non-powers of two: https://godbolt.org/z/aPa6q8r8c

…

On Thu, 12 Sept 2024 at 20:23, Jamie Pond ***@***.***> wrote: hmmm... yeah i see what you mean... On Thu, 12 Sept 2024 at 19:18, thecppzoo ***@***.***> wrote: > This implementation might be simple enough, sure, but it can only accept > lane sizes that have a power of two number of bits. > Let's review if the implementation I made is less efficient than yours. > Otherwise, the much harder challenge of supporting any arbitrary bitcount > will have to decompose the number of bits into its binary representation to > make the groups, and then AI would come to bear more clearly. > In simpler terms, this implementation is like multiplication when the > factor is a power of two, much easier. > > — > Reply to this email directly, view it on GitHub > <#102 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/ARQHT3HLE4OYF3LWWQQLWE3ZWJDRBAVCNFSM6AAAAABOCUJ25GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBXHEYTCNBXGE> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

thecppzoo · 2024-09-13T04:59:02Z

@thecppzoo https://godbolt.org/z/5jdfffb1M

I just did this:
https://godbolt.org/z/cE1eoKM3d

I am very surprised and disappointed that the generated code for powers of two is basically identical, we have now a good example of code that the optimizer does not "understand", or perhaps we have to look deeper about whether this implementation is inherently not efficient.

Another lesson is to always, always, always! work on the straightforward solution of the straightforward need to have something to compare to sophisticated solutions to abstract and general needs.

jamierpond added 3 commits September 12, 2024 00:14

wip

f5f149d

think this is the best

e52122b

clean

79adb65

jamierpond requested a review from thecppzoo September 12, 2024 08:29

jamierpond added 12 commits September 12, 2024 01:42

fine.

e17f304

oops

a64f2b5

tidy up a little?

0b50659

generalise attempt

d3f989d

comment

cc9b7b0

nah, that's undefined

7b6eb0f

ok simplify again

13a51ae

lean

6bc2bea

still undefined

fe5b320

oops

899bb08

mv asserts

1dbdd25

rm asserts

505f9dc

jamierpond commented Sep 12, 2024

View reviewed changes

thecppzoo reviewed Sep 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Suffix cleanup #102

Parallel Suffix cleanup #102

jamierpond commented Sep 12, 2024 •

edited

Loading

jamierpond Sep 12, 2024

thecppzoo Sep 13, 2024

thecppzoo Sep 13, 2024

jamierpond Sep 13, 2024

thecppzoo commented Sep 13, 2024

jamierpond commented Sep 13, 2024 •

edited

Loading

jamierpond commented Sep 13, 2024 via email

jamierpond commented Sep 13, 2024 via email

thecppzoo commented Sep 13, 2024

Parallel Suffix cleanup #102

Are you sure you want to change the base?

Parallel Suffix cleanup #102

Conversation

jamierpond commented Sep 12, 2024 • edited Loading

jamierpond Sep 12, 2024

Choose a reason for hiding this comment

thecppzoo Sep 13, 2024

Choose a reason for hiding this comment

thecppzoo Sep 13, 2024

Choose a reason for hiding this comment

jamierpond Sep 13, 2024

Choose a reason for hiding this comment

thecppzoo commented Sep 13, 2024

jamierpond commented Sep 13, 2024 • edited Loading

jamierpond commented Sep 13, 2024 via email

jamierpond commented Sep 13, 2024 via email

thecppzoo commented Sep 13, 2024

jamierpond commented Sep 12, 2024 •

edited

Loading

jamierpond commented Sep 13, 2024 •

edited

Loading