Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quantile to DataFrame and Series #318

Merged
merged 7 commits into from
Jun 8, 2021

Conversation

V1NAY8
Copy link
Contributor

@V1NAY8 V1NAY8 commented Nov 6, 2020

Closes #315

  • Functionality w.r.t pandas for quantile is implemented i.e. df.agg['quantile',...] , df.quantile() , Series.quantile()
  • Added tests and documentation.

@sethmlarson Please Review 😄

P.S. Need to change commit message while merging to master from percentile to quantile 😃

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all these great changes!!! 🚀 This functionality will be very useful.
I left you a bunch of comments, I'm going to wait for this PR to land before reviewing the .groupby().quantile() PR.

@sethmlarson
Copy link
Contributor

jenkins test this please

@V1NAY8 V1NAY8 requested a review from sethmlarson November 7, 2020 15:51
@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Nov 19, 2020

@sethmlarson Can we merge mode PR into master, because this will have conflicts in operations.py or can reuse some if statements if possible? 😃

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Jan 7, 2021

@sethmlarson This PR is ready. Please Review 😄

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Apr 20, 2021

@stevedodson Please review this and let me know if any changes are required. 😄

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented May 20, 2021

Hi, If this can be merged. I can progress on #316 and #319

@stevedodson
Copy link
Contributor

@V1NAY8 - sorry for the delay, I will review with @sethmlarson next week. Many thanks for your help!

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented May 20, 2021

Sure, I am happy to contribute more. I am much used to the code base. Learning Elastic Search as well! Thanks.

@stevedodson
Copy link
Contributor

jenkins test this please

Copy link
Contributor

@stevedodson stevedodson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following code below shows an issue with pandas (we should raise as an issue if it doesn't exist already). Therefore, don't add bool column types in tests for now.

Python 3.9.5 (default, May 18 2021, 12:31:01) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.23.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.23.1
Python 3.9.5 (default, May 18 2021, 12:31:01) 
[Clang 10.0.0 ] on darwin
import pandas as pd
pd.__version__
Out[3]: '1.2.4'
df = pd.DataFrame({'b': [False, True]})
df
Out[5]: 
       b
0  False
1   True
df.dtypes
Out[6]: 
b    bool
dtype: object
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   b       2 non-null      bool 
dtypes: bool(1)
memory usage: 130.0 bytes
df
Out[8]: 
       b
0  False
1   True
df.quantile(q=0.5)
Traceback (most recent call last):
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-21ac846264fe>", line 1, in <module>
    df.quantile(q=0.5)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/frame.py", line 9266, in quantile
    result = data._mgr.quantile(
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 491, in quantile
    block = b.quantile(axis=axis, qs=qs, interpolation=interpolation)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1592, in quantile
    result = nanpercentile(
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/nanops.py", line 1675, in nanpercentile
    return np.percentile(values, q, axis=axis, interpolation=interpolation)
  File "<__array_function__ internals>", line 5, in percentile
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3818, in percentile
    return _quantile_unchecked(
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce
    r = func(a, **kwargs)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func
    r = _lerp(x_below, x_above, weights_above, out=out)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3961, in _lerp
    diff_b_a = subtract(b, a)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
df.quantile(q=0.5, numeric_only=True)
Traceback (most recent call last):
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-70bdf370f901>", line 1, in <module>
    df.quantile(q=0.5, numeric_only=True)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/frame.py", line 9266, in quantile
    result = data._mgr.quantile(
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 491, in quantile
    block = b.quantile(axis=axis, qs=qs, interpolation=interpolation)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1592, in quantile
    result = nanpercentile(
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/pandas/core/nanops.py", line 1675, in nanpercentile
    return np.percentile(values, q, axis=axis, interpolation=interpolation)
  File "<__array_function__ internals>", line 5, in percentile
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3818, in percentile
    return _quantile_unchecked(
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce
    r = func(a, **kwargs)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func
    r = _lerp(x_below, x_above, weights_above, out=out)
  File "/Users/steve/anaconda3/envs/eland_master/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3961, in _lerp
    diff_b_a = subtract(b, a)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

@V1NAY8 V1NAY8 requested a review from stevedodson May 31, 2021 14:37
@V1NAY8
Copy link
Contributor Author

V1NAY8 commented May 31, 2021

@stevedodson Apologies for the delay :(
Ask jenkins to have fun

@sethmlarson
Copy link
Contributor

Jenkins test this please

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented May 31, 2021

CI is good

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is great, love the test suite so far. Handful of comments and questions

@V1NAY8 V1NAY8 requested a review from sethmlarson June 3, 2021 19:17
@stevedodson
Copy link
Contributor

Jenkins test this please

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!!!

@sethmlarson sethmlarson merged commit e9c0b89 into elastic:master Jun 8, 2021
@V1NAY8 V1NAY8 deleted the add-percentile branch June 8, 2021 18:05
@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Jun 8, 2021

Thank You ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Dataframe.quantile, series.quantile and Dataframe.agg("quantile")
4 participants