You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
cuDF supports rolling.apply for executing a custom python function over the specified rolling windows. It works by taking a PTX string compiled in python by numba and handing it off to a rolling aggregation with kind == PTX, which is then used through rolling_window. Ultimately this invokes jitify and a series of parsing and compilation steps leading to the final kernel that computes the result.
Historically a similar process served to make APIs like Series.applymap work but over time we migrated to an approach that puts together the final kernel in numba rather than c++ for several reasons:
The parsing approach through jitify trips up in several useful cases, such as when numba delivers PTX containing multiple function definitions
The full numba approach allows for extension types to support null values
The full numba approach benefits from features such as LTO through pynvjitlink
Describe the solution you'd like
A pure numba implementation of Rolling.apply, possibly with extension type support. For non nullable data, the implementation could look something like this:
The above seems to get me the correct result in a few test cases locally, of course the real impl would need more to account for min_periods, other dtypes, and more.
A null sensitive implementation seems possible as well building on a lot of what we already have. The idea revolves around assembling the data into a cuda.local.array of MaskedType and then passing that array into the UDF as written. The existing implementations of operations between MaskedTypes should take care of the rest.
@cuda.jitdefkernel(data, mask, win_size, min_periods, out_data, out_mask):
tid=cuda.grid(1)
start=max(0, tid-win_size+1)
end=tid+1local=cuda.local.array(win_size, Masked(types.int64))
# place this window of data into thread local memory as an array of MaskedTypesforiinrange(0, end-start):
local[i] =Masked(data[start+i], mask_get(mask, start+i))
# the device function now iterates through the array of MaskedTypes# any operations are resolved through MaskedType's overloadsres=devfunc(local)
out_data[tid] =res.valueout_mask[tid] =res.valid
Currently however creating a cuda.local.array of extension types needs a few changes (cc @gmarkall). The above would enable handling nulls explicitly within supported UDFs in conditional logic:
Describe alternatives you've considered
One disadvantage of this approach is that we'd need to reimplement logic around handling all of the keyword arguments currently supported for rolling without a custom aggregation, such as center and min_periods. This would have to be maintained separately from libcudf.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
cuDF supports rolling.apply for executing a custom python function over the specified rolling windows. It works by taking a PTX string compiled in python by numba and handing it off to a rolling aggregation with
kind == PTX
, which is then used throughrolling_window
. Ultimately this invokes jitify and a series of parsing and compilation steps leading to the final kernel that computes the result.Historically a similar process served to make APIs like
Series.applymap
work but over time we migrated to an approach that puts together the final kernel in numba rather than c++ for several reasons:Describe the solution you'd like
A pure numba implementation of
Rolling.apply
, possibly with extension type support. For non nullable data, the implementation could look something like this:The above seems to get me the correct result in a few test cases locally, of course the real impl would need more to account for
min_periods
, other dtypes, and more.A null sensitive implementation seems possible as well building on a lot of what we already have. The idea revolves around assembling the data into a
cuda.local.array
ofMaskedType
and then passing that array into the UDF as written. The existing implementations of operations between MaskedTypes should take care of the rest.Currently however creating a
cuda.local.array
of extension types needs a few changes (cc @gmarkall). The above would enable handling nulls explicitly within supported UDFs in conditional logic:Describe alternatives you've considered
One disadvantage of this approach is that we'd need to reimplement logic around handling all of the keyword arguments currently supported for rolling without a custom aggregation, such as
center
andmin_periods
. This would have to be maintained separately from libcudf.The text was updated successfully, but these errors were encountered: