-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quantile to DataFrame and Series #318
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all these great changes!!! 🚀 This functionality will be very useful.
I left you a bunch of comments, I'm going to wait for this PR to land before reviewing the .groupby().quantile() PR.
jenkins test this please |
@sethmlarson Can we merge mode PR into master, because this will have conflicts in |
@sethmlarson This PR is ready. Please Review 😄 |
@stevedodson Please review this and let me know if any changes are required. 😄 |
@V1NAY8 - sorry for the delay, I will review with @sethmlarson next week. Many thanks for your help! |
Sure, I am happy to contribute more. I am much used to the code base. Learning Elastic Search as well! Thanks. |
jenkins test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following code below shows an issue with pandas (we should raise as an issue if it doesn't exist already). Therefore, don't add bool column types in tests for now.
Python 3.9.5 (default, May 18 2021, 12:31:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.23.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.23.1
Python 3.9.5 (default, May 18 2021, 12:31:01)
[Clang 10.0.0 ] on darwin
import pandas as pd
pd.__version__
Out[3]: '1.2.4'
df = pd.DataFrame({'b': [False, True]})
df
Out[5]:
b
0 False
1 True
df.dtypes
Out[6]:
b bool
dtype: object
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 b 2 non-null bool
dtypes: bool(1)
memory usage: 130.0 bytes
df
Out[8]:
b
0 False
1 True
df.quantile(q=0.5)
Traceback (most recent call last):
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-9-21ac846264fe>", line 1, in <module>
df.quantile(q=0.5)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/frame.py", line 9266, in quantile
result = data._mgr.quantile(
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 491, in quantile
block = b.quantile(axis=axis, qs=qs, interpolation=interpolation)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1592, in quantile
result = nanpercentile(
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/nanops.py", line 1675, in nanpercentile
return np.percentile(values, q, axis=axis, interpolation=interpolation)
File "<__array_function__ internals>", line 5, in percentile
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3818, in percentile
return _quantile_unchecked(
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce
r = func(a, **kwargs)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func
r = _lerp(x_below, x_above, weights_above, out=out)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3961, in _lerp
diff_b_a = subtract(b, a)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
df.quantile(q=0.5, numeric_only=True)
Traceback (most recent call last):
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-10-70bdf370f901>", line 1, in <module>
df.quantile(q=0.5, numeric_only=True)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/frame.py", line 9266, in quantile
result = data._mgr.quantile(
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 491, in quantile
block = b.quantile(axis=axis, qs=qs, interpolation=interpolation)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1592, in quantile
result = nanpercentile(
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/nanops.py", line 1675, in nanpercentile
return np.percentile(values, q, axis=axis, interpolation=interpolation)
File "<__array_function__ internals>", line 5, in percentile
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3818, in percentile
return _quantile_unchecked(
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce
r = func(a, **kwargs)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func
r = _lerp(x_below, x_above, weights_above, out=out)
File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3961, in _lerp
diff_b_a = subtract(b, a)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
@stevedodson Apologies for the delay :( |
Jenkins test this please |
CI is good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is great, love the test suite so far. Handful of comments and questions
Jenkins test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!!!
Thank You ❤️ |
Closes #315
df.agg['quantile',...]
,df.quantile()
,Series.quantile()
@sethmlarson Please Review 😄
P.S. Need to change commit message while merging to master from
percentile
toquantile
😃