Group time series split #915

labdmitriy · 2022-04-23T18:48:33Z

Code of Conduct

Description

Add group time series cross-validator implementation.
Add tests with 100% coverage using pytest.
I decided to create pull request before creating documentation and change log modification, to discuss current implementation and further steps to implement.

Related issues or pull requests

Fixes #910

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./mlxtend

labdmitriy · 2022-04-23T19:37:15Z

Sorry, mistakenly added 2 files (settings.json and conda_requirements.txt), removed it using additional commits.

rasbt

Thanks for the PR! Wow, this looks really good and super professional! While looking over the code, I had these two thoughts here:

rasbt · 2022-04-26T00:13:38Z

mlxtend/evaluate/__init__.py

@@ -40,4 +41,5 @@
           "RandomHoldoutSplit", "PredefinedHoldoutSplit",
           "ftest", "combined_ftest_5x2cv",
           "proportion_difference", "bias_variance_decomp",
-           "accuracy_score", "create_counterfactual"]
+           "accuracy_score", "create_counterfactual",
+           "time_series"]


I think "time_series" should be "GroupTimeSeriesSplit" here

Yes I was wrong, corrected and commited changes.

rasbt · 2022-04-26T00:17:44Z

mlxtend/evaluate/tests/test_time_series.py

+import numpy as np
+import pytest
+from mlxtend.evaluate import GroupTimeSeriesSplit
+


The whole unit test suite here looks pretty comprehensive to me. I wonder if we could add a computationally cheap scikit-learn-related test though. E.g., plugging it into cross_val_score or GridSearchCV as cv argument?

Added test to check the usage with cross_val_score based on DummyClassifier with "most_frequent" strategy.
So with this split we will have 3 splits and the following values/accuracy for test true and predicted targets:

y y_pred accuracy

[1 1] [0 0] 0

[0 1] [1 1] 0.5

[1 0 0 0] [1 1 1 1] 0.25

This is nice, thanks! Mainly why I suggested it is to check for API compatibility. I am pretty sure it is compatible with GridSearchCV and cross_val_score, but you never know. It's also to make sure in case we modify it in the future that it still works, or in case the scikit-learn API changes, it still works.

What I had in mind was something like:

from mlxtend.evaluate import GroupTimeSeriesSplit from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score lr = LogisticRegression() cv = GroupTimeSeriesSplit(...) cross_val_score(lr, X, y, cv=cv)

labdmitriy · 2022-04-28T13:37:32Z

Hi Sebastian,

During the implementaion of requested changes, I have some questions:

When I changed something in my feature branch, and upstream/master is changed while the feature development, should I merge master branch to my feature branch even if I don't have any conflict changes? Sorry if it is basic question, but I tried to search best practices and didn't find common solution for it. I tried to rebase but as I understand it is not good idea when I already push my feature branch to public, and also I had diverged branches after that. So I decided to just change my code and commit/push it without any merging and rebasing now.
sklearn has the following statement about __init__() method in custom estimators:
"There should be no logic, not even input validation, and the parameters should not be changed."
In the current implementation of GroupTimeSeriesSplit, in init() method I checked only parameters that do not require any calculations, and the rest of the logic I implemented in split() method with invocation of the self._calculate_split_params() after these checks. I think that it looks pretty weird.
What do you think, what will the best option here:

To move all checks including self._calculate_split_params() to __init__() method
To move all checks to split() method
To keep everything "as is"
Update: I remember that groups are used only in split() method, not __init__(), that's why I implemented several checks in split() method.

Where is the best place to discuss this feature - here in pull request or in related issue?
When you have time, will it be possible to see the questions from related issue? Probably I had too many questions, and I am ready to group the questions somehow if it is required.

Thank you.

pep8speaks · 2022-05-01T05:47:58Z

Hello @labdmitriy! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-05-25 06:50:32 UTC

labdmitriy · 2022-05-01T06:00:57Z

I've implemented additional changes:

Lessen the restrictions for the group sequences. Now it is not required to be sorted in increasing order, only to be consecutive. The reason is that now it can accept not only group numbers but also group names, where sorting can be confusing.
Add several tests for group names.
I saw that you reformatted the code with black8, and I decided to do this for PR code. It seems that now 79 chars is required max line length (corresponding to pep8speaks bot messages), so I changed default black8 line length value from default (88) to 79 and reformatted it again.

labdmitriy · 2022-05-01T13:21:13Z

There are several updates of __init__.py because imports are automatically sorted by VS Code (using built-in isort tool) whereas imports in the file are not sorted, so for not changing your imports order I did it separately in plain text editor.
Should I resolve conflicts myself in this file?

rasbt · 2022-05-03T02:46:02Z

Thanks a lot for the PR! Sorry, I recently got really swamped with tasks due to some paper reviews and following up on other PRs! I will hopefully have more time again soon!

When I changed something in my feature branch, and upstream/master is changed while the feature development, should I merge master branch to my feature branch even if I don't have any conflict changes? Sorry if it is basic question, but I tried to search best practices and didn't find common solution for it. I tried to rebase but as I understand it is not good idea when I already push my feature branch to public, and also I had diverged branches after that. So I decided to just change my code and commit/push it without any merging and rebasing now.

This is a good point. Unless it affects your code, there is nothing to worry about. GitHub will automatically flag conflicts if they appear and we can try to deal with them then in the webinterface.

Btw I resolved one of the issues, and you may have to pull on your end.

sklearn has the following statement about init() method in custom estimators:
"There should be no logic, not even input validation, and the parameters should not be changed."
In the current implementation of GroupTimeSeriesSplit, in init() method I checked only parameters that do not require any calculations, and the rest of the logic I implemented in split() method with invocation of the self._calculate_split_params() after these checks. I think that it looks pretty weird.
What do you think, what will the best option here

That's a tricky one. Personally, I feel like it's fine to check e.g., input arguments etc. It's not strictly scikit-learn consistent, but I don't see any harm in it to be honest. So, I am just checking the BootstrapOutOfBag here and I also have an init check here. I don't recall it ever causing any issues (https://github.com/rasbt/mlxtend/blob/master/mlxtend/evaluate/bootstrap_outofbag.py#L44).

Where is the best place to discuss this feature - here in pull request or in related issue?

I'd say it's best to discuss it here so that we don't have to jump back and forth too much.

When you have time, will it be possible to see the questions from related issue? Probably I had too many questions, and I am ready to group the questions somehow if it is required.

No worries, it's all good! :). I try to keep on top of things, but I am a little bit time constrained this week.

rasbt · 2022-05-03T03:14:54Z

Good point regarding the pep8/black discrepancy. I was thinking about this a bit, but maybe it's just time to adjust to more modern times and use the 88 character limit rather than the 79.

Then, if users are strict about 79 chars, it should still be okay when they submit it. On the other hand, if they are a the 88 lines, they don't get a complain either. I think this will make the PRs also a bit more frictionless while still maintaining recommended styles. I will adjust the pep8 checker via #920

labdmitriy · 2022-05-03T05:54:26Z

Hi Sebastian,

Thanks a lot for the PR! Sorry, I recently got really swamped with tasks due to some paper reviews and following up on other PRs! I will hopefully have more time again soon!

No problem, I will also have more free time 2 next weeks, so I can answer more quickly too.

Btw I resolved one of the issues, and you may have to pull on your end.

I guess you mean black reformatting and resolving conflict in __init__.py - I will pull it in my local feature branch.

That's a tricky one. Personally, I feel like it's fine to check e.g., input arguments etc. It's not strictly scikit-learn consistent, but I don't see any harm in it to be honest. So, I am just checking the BootstrapOutOfBag here and I also have an init check here. I don't recall it ever causing any issues (https://github.com/rasbt/mlxtend/blob/master/mlxtend/evaluate/bootstrap_outofbag.py#L44).

Great! Then I will keep it 'as is'.

I'd say it's best to discuss it here so that we don't have to jump back and forth too much.

Great, ok!

No worries, it's all good! :). I try to keep on top of things, but I am a little bit time constrained this week.

Ok, thank you!

I also noticed that lint checking was not successful after your last merge to origin/group-times-series, I think because of 2 reasons:
- After linting with black8 with default parameters, my reformatting with line length = 79 was changed again to 88.
- You deleted trailing comma in __init__.py, but black8 intentionally adds a trailing comment in this case (and I added it also in one of the previous commits):
  https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#trailing-commas
So because you changed pep8 configuration to 88, I reconfigured black/flake8 in my configuration and reformatted files again (including trailing comma) - and now it was checked successfully.
Are you considering the option to automatically sort the imports using isort?
For my projects I use pre-commit hooks for autoformatting/linting, it seems very convenient and probably reduces number of review iterations. Maybe such predefined configuration will be useful for contributors?
Do I need to prepare something for the next implementation steps until you have more free time?
I noticed that there are some typos in Quick Contributor Checklist after updating to black:

Check the autimated tests passed.
The atuomatic PEP8/black integrations may prompt you to modify the code stylistically. It would be nice if you could apply the suggested changes.

Thank you.

rasbt · 2022-05-13T00:55:06Z

Ohhh, I was maybe looking at the old file. You are totally right! The new test function you added with cross_val_score is totally sufficient!

To be honest, the code looks really good to me now. The next thing on my list is to check out the Jupyter Nb then.

Yeah, I personally also always (often) separated standard lib imports and 3rd party imports (when I don't forget). So, I like the idea of adding the known_first_party parameter. I can add that.

I understand that you have too much projects so it is not a problem at all, if you want we can continue with this pull request later and have a break now with it or even stop it at all, the goal was to collaborate with you and gain an experience in contributing, and of course this is not the thing with the highest priority :)

Oh, I definitely don't want to drop this. It's a really nice and useful PR. I also learned a lot regarding black & isort. Very useful!

I will try to review the Nb either later tonight or tomorrow early morning to give some more feedback.

labdmitriy · 2022-05-13T08:58:56Z

To be honest, the code looks really good to me now. The next thing on my list is to check out the Jupyter Nb then.

Great!
As I mentioned before, Jupyter notebook is just a draft and your feedback will be very useful because I don't know what is expected finally.
Thank you!

Yeah, I personally also always (often) separated standard lib imports and 3rd party imports (when I don't forget). So, I like the idea of adding the known_first_party parameter. I can add that.

I made the comment in the relevant pull request about updated configuration, probably mlxtend was implied not biopandas.

Oh, I definitely don't want to drop this. It's a really nice and useful PR. I also learned a lot regarding black & isort. Very useful!

I will try to review the Nb either later tonight or tomorrow early morning to give some more feedback.

Excellent!

This sounds super cool to be honest. Haha, but given that I currently already have too many things on which I am far behind, I should probably say "no." It's not that I am not interested, but I really need to finish things before I start new things!

I've been thinking about it and would like to suggest returning to this task when you have time, if there is still interest for you.
Perhaps by that time I will already have several articles on this topic, and we could update the process in your library.
I think it might be useful for both of us.

rasbt · 2022-05-14T14:58:59Z

The documentation is a great start. It looks very comprehensive, and I love the plots. What's nice about them is that they are automatically generated, so that this allows us and the users to create the plots for all kinds of scenarios.

However, regarding the documentation of use cases, I don't think it needs to be exhaustive and show all possible ways you can use it. This would be very overwhelming.

My suggestions are to focus on a few, but give the users the tools to explore the other ones if they are interested. (I.e., if we have a few well explained examples, users can copy and modify them).

So, concretely, here are a few suggestions:

Move the helper function into a utility function into the library so that it can be imported without taking too much attention/space in the documentation. This will make the documentation more readable. You can move it into mlxtend.evalaute.time_series. These should be private methods though that shouldn't come up in the API documentation. I think that happens automatically as long as you don't add them in __init__.py.
It would be nice if each headline could tell the reader what the demonstration or use case is that they are reading about. I think there only need to be a few sections in my opinion. For example,

Your first example could be:

A time series cross-validation iterator with multiple training groups

Then, the second one could be

Defining the gap size between training and test folds
Expanding the window size

and that's it.

for each example, I would have the visualization followed by the a "Usage in CV" example. This way, readers don't have to read the whole docs in order to figure that out. Each example will be self-contained in this way.
I would get rid of the failure cases in the docs. They are more like unit tests I would say? The errors are descriptive enough to figure these issues out if users encounter them.
Once the general structure is there, it would be nice to add some more descriptive text

labdmitriy · 2022-05-14T15:39:32Z

Hi Sebastian,

Thanks a lot for your feedback, I will prepare the notebook based on your requirements and will push all the changes.

labdmitriy · 2022-05-15T11:28:07Z

Hi Sebastian,

I've made changes in Jupyter notebook and the code based on your comments, and have a few questions/notes:

Notes

Based on the fact that there are 2 combinations of required parameters (test_size + train_size and test_size + n_splits), I decided to include 4 examples in the notebook.
I decided to include plot, split and cv usage information for each example
I added the last cell to the notebook which includes API description (similar to another notebooks).
I have one trouble with highlighting when I print the split/cv info, and the word with the colon at the end is highlighted with red color.
I found similar issue on GitHub, but don't understand how to solve it in this case, could you please help with it? You can see it after you render the documentation.

Questions

Should I still add *_files directory for GroupTimeSeriesSplit?
I noticed that there are some differencies in isort/black configuration for pre-commit hook and CI:

isort
Pre-commit hook: args: ["--profile", "black"]
CI: isort --check --diff --line-length 88 --multi-line 3 --py 39 --profile black mlxtend/*

black
Pre-commit hook: language_version: python3.9
CI: black --check --diff mlxtend/*

I checked default isort configuration for black profile:
```
multi_line_output: 3
include_trailing_comma: True
force_grid_wrap: 0
use_parentheses: True
ensure_newline_before_comments: True
line_length: 88
```
Probably line length and multiline settings are not required to specify it manually.
But perhaps Python versions can be inconsistent because isort and black have the following default Python version parameters (considering that the Python version 3.8 is specified in CI):

isort
```
--py {all,2,27,3,310,35,36,37,38,39,auto}, --python-version {all,2,27,3,310,35,36,37,38,39,auto}
                        Tells isort to set the known standard library based on the specified Python version. Default is to assume any
                        Python 3 version could be the target, and use a union of all stdlib modules across versions. If auto is specified,
                        the version of the interpreter used to run isort (currently: 38) will be used.
```
black
```
-t, --target-version [py33|py34|py35|py36|py37|py38|py39|py310]
                                  Python versions that should be supported by
                                  Black's output. [default: per-file auto-
                                  detection]
```
Also to start using isort/black/flake8 rules for development, contributor should specify all the configurations from hook or CI manually (and also should install these libraries), to format on save in IDE.

Maybe configuration of the files like .isort.cfg, .black will be the solution have consistent rules everywhere? There are more modern approaches in configuration like pyproject.toml, but unfortunately it is still not supported by flake8.
Here is a good description about configuration using different files.

Also I saw that for flake8 you have configuration file .flake8 that can be used for IDE, default configuration is used for pre-commit hook (as I understand from here, we need to specify arguments for hooks explicitly), and for CI the are another custom settings.

And in precommit configuration file you have the order black-flake8-isort, is it correct order (linting will be executed before all formatting steps are completed)? Maybe flake8 should be the last step?

Again I am asking too many questions 😄, but I think (and hope 🤞) it can be useful.

Thank you.

labdmitriy · 2022-05-20T16:14:43Z

Hi @rasbt,

Could you please tell do I need to improve something for this PR?

Thank you.

rasbt · 2022-05-20T20:36:32Z

I really like this restructured version! A few points that I think can be improved

1

It's nice to start off with a general problem introduction (just as you did). However, considering that there is https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html, people might also be curious about the relationship to TimeSeriesSplit and the features GroupTimeSeriesSplit adds.

2

A few words about the example data would be helpful.

First, I would put the features and targets first, and then start with something like "For the following examples, we are creating an example dataset consisting of 16 training data points ...". And then you can explain that we create 6 different groups so that the first training example belongs to group 0, the next 4 to group 1, and so forth.

Btw. side questions about the implementation: do the groups do have to be consecutive order? Or could it be

groups = np.array([0, 5, 5, 5, 5, 2, 2, 2, 3, 3, 4, 4, 1, 1, 1, 1])

3

For each example here, it would also be nice to start with a few words describing what we are looking at here:

Otherwise it is pretty good and much more accessible than before! Thanks for the update!

rasbt · 2022-05-20T20:52:13Z

I have one trouble with highlighting when I print the split/cv info, and the word with the colon at the end is highlighted with red color.
I found similar mkdocs/mkdocs#902 (comment) on GitHub, but don't understand how to solve it in this case, could you please help with it? You can see it after you render the documentation.

Sure, let's revisit this when we have the final version. I can test this in my local mkdocs version then.

Should I still add *_files directory for GroupTimeSeriesSplit?

That would not be necessary. I will also remove the other folders in the future to save space on GitHub

Good points regarding the CI/Workflow setups. Regarding line-length I just wanted to be explicit about that as a visual aide (so that it is clear that this is 88 without knowing the defaults. I think some people are still new to black and expect it to be 79 perhaps.

I think I had some issues with multiline which is why I added it but I don't remember to be honest. Do you know if you can run it --py 39 in older Python versions without issue btw?

The inconsistency you mentioned refers to the missing --py 39 in .pre-commit-config.yaml?

labdmitriy · 2022-05-21T11:48:18Z

Hi @rasbt,

It's nice to start off with a general problem introduction (just as you did). However, considering that there is https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html, people might also be curious about the relationship to TimeSeriesSplit and the features GroupTimeSeriesSplit adds.

I described the advantages of this implementation over scikit-learn's TimeSeriesSplit

First, I would put the features and targets first, and then start with something like "For the following examples, we are creating an example dataset consisting of 16 training data points ...". And then you can explain that we create 6 different groups so that the first training example belongs to group 0, the next 4 to group 1, and so forth.

I create dataset with features and specify months as index to have more clear description for usage examples, therefore I decided to define groups and months before features and target.
Now I reordered these sections due to your recommendations, and just set the index at the end of this code block.

Btw. side questions about the implementation: do the groups do have to be consecutive order?

They could be as in your example and I have corresponding tests and description in the first section of the notebook, and I supposed that this order (not in ascending order but with continuous group values) is "consecutive".
But probably this description was not clear enough so I added additional correct/incorrect examples of the groups order.
How do you think maybe there is another word better than 'consecutive' to describe it?

For each example here, it would also be nice to start with a few words describing what we are looking at here:

Done

Sure, let's revisit this when we have the final version. I can test this in my local mkdocs version then.

Thank you!

That would not be necessary. I will also remove the other folders in the future to save space on GitHub

Great!

I think I had some issues with multiline which is why I added it but I don't remember to be honest. Do you know if you can run it --py 39 in older Python versions without issue btw?

Unfortunately I don't have such experience but probably when I will progress in articles about development, I can tell more exact.

The inconsistency you mentioned refers to the missing --py 39 in .pre-commit-config.yaml?

Here I just tried to share my thoughts that CI and pre-commit hooks has probably little different configurations for isort and black, and the decision what to add or delete depends on the desired target configuration.

To be more succinct, my another notes were about the following:

The specified order in pre-commit hook is black-flake8-isort, but probably black-isort-flake8 will be more correct?
flake8 has 3 somewhat different configurations (.flake8, pre-commit hook and CI).

Thank you.

labdmitriy · 2022-05-21T18:00:23Z

@rasbt
I also fixed list rendering for Overview section.
Now it seems that I've made all the changes you mentioned.

labdmitriy · 2022-05-24T14:03:43Z

Hi @rasbt,

Could you please tell if there are no plans to finish this pull request over the next few weeks?
If so then I will switch to another tasks and will come back to this PR in summer.

Thank you.

rasbt · 2022-05-24T15:00:05Z

Thanks for updating the docs, and thanks for your patience. I currently have too many ongoing projects, so I can't check in every day. So, if you want to revisit this in summer, I can totally understand. On the other hand, I think this PR is very close. It's just a bit of polishing the docs.
I took a crack at it and did some minor rewording and adding a few sentences here and there. Maybe have a look if that looks good to you. And if you don't have any further feedback, I'd say it's good to merge after adding the Changelog entry.

Btw when going over the docs, there was mainly only one thing that I found a bit confusing. I.e., in example 4, I wasn't sure what exactly the expanding window size was?

E.g., here an image from Example 3:

And an image from Example 4:

What's exactly expanded here? Do you mean that there are now 4 instead of 3 training groups (but this is specified) or is there something else I am missing?

My best guess was that you mean that the splits depend on the training and test sizes, so I moved this illustration up to example 1 where we talk about the train and test group sizes. I think this way it is a more natural reading order for the user. What do you think?

labdmitriy · 2022-05-24T15:11:14Z

Hi @rasbt,

Sorry if you feel that I am forcing you, it was not my intention. I thought that several last times I disturbed you, and decided to switch to another tasks until you have more free time.
I will definitely answer to all the questions tomorrow and will stop with my endless questions.
Sorry again.

Thank you.

rasbt · 2022-05-25T01:31:06Z

No worries! It's all good :). I am still very excited about this PR and sometimes just wish the day has more hours 😅

labdmitriy · 2022-05-25T06:51:24Z

Hi @rasbt,

Thank you for your update, your description is much clearer and cleaner than mine, of course it is good for me. I just fixed one my typo.

Btw when going over the docs, there was mainly only one thing that I found a bit confusing. I.e., in example 4, I wasn't sure what exactly the expanding window size was?
What's exactly expanded here? Do you mean that there are now 4 instead of 3 training groups (but this is specified) or is there something else I am missing?
My best guess was that you mean that the splits depend on the training and test sizes, so I moved this illustration up to example 1 where we talk about the train and test group sizes. I think this way it is a more natural reading order for the user. What do you think?

Probably I didn't understand your recommendation from here correctly:

Expanding the window size

I thought that it is about expanding the size of training or test dataset, but maybe you mean another type of window (expanding).
Anyway Example 1 seems to be useful especially after you regroup the order.
Now it is even better than I expected.

And if you don't have any further feedback, I'd say it's good to merge after adding the Changelog entry.

I added new entry to Changelog for GroupTimeSeriesSplit.
Just noticed that the section New Features and Enhancements is appears twice for 0.20.0 version so I added the description to the end of the list.

Thank you.

rasbt · 2022-05-27T01:14:56Z

Thanks, I think this PR is ready to merge now. Thanks so much for the hard and dedicated work on this! And sorry for me not being as responsive, it's a really busy time for me right now. However, I am super glad about this PR. I will maybe try to make even a new version release tonight so that it can be used.

Regarding the

Expanding the window size

comment. We can always fine-tune the documentation later. It's not coupled to the main code. But what I meant was that I found that example 4 was basically a natural extension of example 1, so I merged them. From a reader's perspective, I was thinking that this is more intuitive this way.

labdmitriy · 2022-05-27T02:50:58Z

Thanks, I think this PR is ready to merge now. Thanks so much for the hard and dedicated work on this! And sorry for me not being as responsive, it's a really busy time for me right now. However, I am super glad about this PR. I will maybe try to make even a new version release tonight so that it can be used.

Thank you for your patience while answering to all my questions, it was very useful and interesting experience!

But what I meant was that I found that example 4 was basically a natural extension of example 1, so I merged them. From a reader's perspective, I was thinking that this is more intuitive this way.

You are totally correct, it is much better now.

I you don’t mind, I will write the article about this experience, because I know a lot of people (like me until recently) thinking that contributing is too difficult to even try it.
I will always be happy to discuss with you any questions and collaboration if you ever have free time and interest.

rasbt · 2022-05-27T16:32:52Z

Thank you for your patience while answering to all my questions, it was very useful and interesting experience!

Actually, this was really great. During this process, we added lots of useful things like pre-commit hooks, black, isort, etc. :)!

I you don’t mind, I will write the article about this experience, because I know a lot of people (like me until recently) thinking that contributing is too difficult to even try it.

Sure, I think this is worthwhile and will be interesting for many people!

I will always be happy to discuss with you any questions and collaboration if you ever have free time and interest.

Cool! I will keep that in mind!

labdmitriy added 2 commits April 23, 2022 21:29

Implement group time series cross-validator

b2e27b6

Add tests

ea68fd0

labdmitriy changed the title ~~Group time series~~ Group time series split Apr 23, 2022

labdmitriy added 3 commits April 23, 2022 22:27

Change non_sorted_group_numbers fixture for tests

7db32bb

Delete mistakenly added settings.json

06b7672

Delete mistakenly added conda_requirements.txt

e6c9c63

Change error tests naming to be more consistent

09b7d05

rasbt requested changes Apr 26, 2022

View reviewed changes

labdmitriy added 2 commits April 28, 2022 15:23

Change imported object to correct one

cf01967

Add test for cross_val_score

0b23a1c

labdmitriy requested a review from rasbt April 28, 2022 13:43

labdmitriy added 3 commits May 1, 2022 08:24

Lessen the group restrictions to be only consecutive

afe15ff

Add more tests for group names

a8ed477

Reformat using black

04128a1

Reformat using black with line length = 79

7158145

labdmitriy added 2 commits May 1, 2022 16:12

Add GroupTimeSeriesSplit to __all__ list

267ee40

Add trailing comma

74fdbd0

labdmitriy and others added 2 commits May 2, 2022 13:28

Implement group_starts_idx using itertools functions

672b436

Merge branch 'master' into group-time-series

0fc98a9

Reformat using black

8501a37

labdmitriy added 2 commits May 3, 2022 12:03

Fix formatting without trailing comma

be1d3cd

Fix messages text

718aafb

rasbt approved these changes May 13, 2022

View reviewed changes

add description at the top

0010304

Prepare description and code for documentation

82eb124

Reformat documentation

8a69587

Fix overview list formatting

75af8f0

some doc updates

37e92b5

rasbt mentioned this pull request May 24, 2022

Improve pre-commit hooks #944

Closed

merge ex 1 and 4

7c2dedd

labdmitriy added 2 commits May 25, 2022 09:27

Fix typo

fb2a2f0

Add entry for GroupTimeSeriesSplit

fc17856

rasbt merged commit 9e08841 into rasbt:master May 27, 2022

rasbt mentioned this pull request Jun 8, 2022

Precommit #950

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group time series split #915

Group time series split #915

labdmitriy commented Apr 23, 2022 •

edited

Loading

labdmitriy commented Apr 23, 2022

rasbt left a comment

rasbt Apr 26, 2022

labdmitriy Apr 28, 2022

rasbt Apr 26, 2022

labdmitriy Apr 28, 2022

rasbt May 11, 2022 •

edited

Loading

labdmitriy commented Apr 28, 2022 •

edited

Loading

pep8speaks commented May 1, 2022 •

edited

Loading

labdmitriy commented May 1, 2022

labdmitriy commented May 1, 2022 •

edited

Loading

rasbt commented May 3, 2022

rasbt commented May 3, 2022

labdmitriy commented May 3, 2022 •

edited by rasbt

Loading

rasbt commented May 13, 2022

labdmitriy commented May 13, 2022

rasbt commented May 14, 2022

labdmitriy commented May 14, 2022

labdmitriy commented May 15, 2022 •

edited

Loading

labdmitriy commented May 20, 2022

rasbt commented May 20, 2022

rasbt commented May 20, 2022

labdmitriy commented May 21, 2022

labdmitriy commented May 21, 2022 •

edited

Loading

labdmitriy commented May 24, 2022

rasbt commented May 24, 2022

labdmitriy commented May 24, 2022

rasbt commented May 25, 2022

labdmitriy commented May 25, 2022 •

edited

Loading

rasbt commented May 27, 2022

labdmitriy commented May 27, 2022 •

edited

Loading

rasbt commented May 27, 2022

Group time series split #915

Group time series split #915

Conversation

labdmitriy commented Apr 23, 2022 • edited Loading

Code of Conduct

Description

Related issues or pull requests

Pull Request Checklist

labdmitriy commented Apr 23, 2022

rasbt left a comment

Choose a reason for hiding this comment

rasbt Apr 26, 2022

Choose a reason for hiding this comment

labdmitriy Apr 28, 2022

Choose a reason for hiding this comment

rasbt Apr 26, 2022

Choose a reason for hiding this comment

labdmitriy Apr 28, 2022

Choose a reason for hiding this comment

rasbt May 11, 2022 • edited Loading

Choose a reason for hiding this comment

labdmitriy commented Apr 28, 2022 • edited Loading

pep8speaks commented May 1, 2022 • edited Loading

Comment last updated at 2022-05-25 06:50:32 UTC

labdmitriy commented May 1, 2022

labdmitriy commented May 1, 2022 • edited Loading

rasbt commented May 3, 2022

rasbt commented May 3, 2022

labdmitriy commented May 3, 2022 • edited by rasbt Loading

rasbt commented May 13, 2022

labdmitriy commented May 13, 2022

rasbt commented May 14, 2022

labdmitriy commented May 14, 2022

labdmitriy commented May 15, 2022 • edited Loading

labdmitriy commented May 20, 2022

rasbt commented May 20, 2022

1

2

3

rasbt commented May 20, 2022

labdmitriy commented May 21, 2022

labdmitriy commented May 21, 2022 • edited Loading

labdmitriy commented May 24, 2022

rasbt commented May 24, 2022

labdmitriy commented May 24, 2022

rasbt commented May 25, 2022

labdmitriy commented May 25, 2022 • edited Loading

rasbt commented May 27, 2022

labdmitriy commented May 27, 2022 • edited Loading

rasbt commented May 27, 2022

labdmitriy commented Apr 23, 2022 •

edited

Loading

rasbt May 11, 2022 •

edited

Loading

labdmitriy commented Apr 28, 2022 •

edited

Loading

pep8speaks commented May 1, 2022 •

edited

Loading

labdmitriy commented May 1, 2022 •

edited

Loading

labdmitriy commented May 3, 2022 •

edited by rasbt

Loading

labdmitriy commented May 15, 2022 •

edited

Loading

labdmitriy commented May 21, 2022 •

edited

Loading

labdmitriy commented May 25, 2022 •

edited

Loading

labdmitriy commented May 27, 2022 •

edited

Loading