[FEA] Should `Rolling.apply` use pure numba rather than jitify? #18033

brandon-b-miller · 2025-02-18T21:46:35Z

Is your feature request related to a problem? Please describe.
cuDF supports rolling.apply for executing a custom python function over the specified rolling windows. It works by taking a PTX string compiled in python by numba and handing it off to a rolling aggregation with kind == PTX, which is then used through rolling_window. Ultimately this invokes jitify and a series of parsing and compilation steps leading to the final kernel that computes the result.

Historically a similar process served to make APIs like Series.applymap work but over time we migrated to an approach that puts together the final kernel in numba rather than c++ for several reasons:

The parsing approach through jitify trips up in several useful cases, such as when numba delivers PTX containing multiple function definitions
The full numba approach allows for extension types to support null values
The full numba approach benefits from features such as LTO through pynvjitlink

Describe the solution you'd like
A pure numba implementation of Rolling.apply, possibly with extension type support. For non nullable data, the implementation could look something like this:

def count_if_gt_3(window):
    count = 0
    for i in window:
            if i > 3:
                    count += 1
    return count

devfunc = cuda.jit(device=True)(count_if_gt_3)
out = np.zeros(len(s))


@cuda.jit
def kernel(data, win_size, min_periods, out):
    tid = cuda.grid(1)

    start = max(0, tid - win_size + 1)
    end = tid + 1
        
    thread_win = data[start:end]

    res = devfunc(thread_win)
    out[tid] = res

The above seems to get me the correct result in a few test cases locally, of course the real impl would need more to account for min_periods, other dtypes, and more.

A null sensitive implementation seems possible as well building on a lot of what we already have. The idea revolves around assembling the data into a cuda.local.array of MaskedType and then passing that array into the UDF as written. The existing implementations of operations between MaskedTypes should take care of the rest.

@cuda.jit
def kernel(data, mask, win_size, min_periods, out_data, out_mask):
    tid = cuda.grid(1)

    start = max(0, tid - win_size + 1)
    end = tid + 1

    local = cuda.local.array(win_size, Masked(types.int64))

    # place this window of data into thread local memory as an array of MaskedTypes
    for i in range(0, end-start):
        local[i] = Masked(data[start+i], mask_get(mask, start+i))
    
    # the device function now iterates through the array of MaskedTypes
    # any operations are resolved through MaskedType's overloads
    res = devfunc(local)
    out_data[tid] = res.value
    out_mask[tid] = res.valid

Currently however creating a cuda.local.array of extension types needs a few changes (cc @gmarkall). The above would enable handling nulls explicitly within supported UDFs in conditional logic:

def count_if_gt_3(window):
    count = 0
    for i in window:
            if i != cudf.NA:
                if i > 3:
                    count += 1
            else:
                return -1
    return count

Describe alternatives you've considered
One disadvantage of this approach is that we'd need to reimplement logic around handling all of the keyword arguments currently supported for rolling without a custom aggregation, such as center and min_periods. This would have to be maintained separately from libcudf.

The text was updated successfully, but these errors were encountered:

brandon-b-miller added feature request New feature or request numba Numba issue Python Affects Python cuDF API. labels Feb 18, 2025

github-project-automation bot added this to cuDF Python Feb 18, 2025

github-project-automation bot moved this to Todo in cuDF Python Feb 18, 2025

Matt711 assigned brandon-b-miller Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Should `Rolling.apply` use pure numba rather than jitify? #18033

[FEA] Should `Rolling.apply` use pure numba rather than jitify? #18033

brandon-b-miller commented Feb 18, 2025 •

edited

Loading

[FEA] Should Rolling.apply use pure numba rather than jitify? #18033

[FEA] Should Rolling.apply use pure numba rather than jitify? #18033

Comments

brandon-b-miller commented Feb 18, 2025 • edited Loading

[FEA] Should `Rolling.apply` use pure numba rather than jitify? #18033

[FEA] Should `Rolling.apply` use pure numba rather than jitify? #18033

brandon-b-miller commented Feb 18, 2025 •

edited

Loading