-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #956 Add features_group
parameter to ExhaustiveFeatureSelector [WIP]
#957
Conversation
Recently, I found another PR ( #651 ) for adding fixed_features to ExhaustiveFeatureSelector. So, I think we are going to have a slight problem here. Let's say our features are
So...how should we deal with this? We can:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks actually pretty good to me. If this works, this is a simple and elegant solution. I guess the next step would be unit tests. Would you like any help with it?
Thanks for this PR btw! I think having EDIT: We can also make this PR specific to the Exhaustive Feature Selector and revisit the sequential one at a later time. I think it will require more work for sure. |
Oh good call. To be honest, I think it might be best to check for conflicts and raise an error so the user can fix it. It's maybe better than having a silent behavior that is not clear to the user. So in the case above, maybe an error message would make sense like raise ValueError("Feature 4 was specified as a fixed feature but feature 4 has other features in its feature group.") And in the docstring, we can maybe say something like
What do you think? |
You are right... It would be safer. We are putting just a little load on the shoulder of users to take care of such conflict. However, they would have then a better understanding of what's going on :) |
Thanks for creating this opportunity :) I enjoy doing this :)
Sure.
I will probably need some help 😄 For now, I am going to take a look at the existing test function(s) and see how things are. I will try to do it myself. If I get into any issue, I will let you know. |
@rasbt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will resolve/address these comments in my next push.
We also need to add two notes to the docstring:
|
I like the suggestion! Slightly more concise version for the first:
|
Cool! Thanks for revising the note. I will add them to the docstring. |
Hello @NimaSarajpoor! Thanks for updating this PR.
Comment last updated at 2022-08-09 01:29:49 UTC |
@rasbt
Notes:
|
features_group
parameter to features selection [WIP]features_group
parameter to ExhaustiveFeatureSelector [WIP]
Thanks so much for the update.
For the check, maybe the following ones are good: a) a check that the feature groups contain all feature indices if _merge_lists(feature_groups) != set(range(X.shape[1]):
raise ValueError(`feature_group` must contain all features within `range(X.shape[1]`) b) check that feature groups are not empty: for fg in feature_groups:
if not fg:
raise ValueError("Feature group can't be empty)
Wohooo! Ohh, 3 a) reminds me that the Option 1 (ideal): I suppose what we want in this case is that the user can specify the feature groups by column names. For example, Option 2 (ok): We could also let the feature group selection integer based, e.g., Option 3: For now, we can just add a check and say that What do you think, is option 1 feasible at this point or do you prefer Option 2/3 and to revisit this later? |
Sure! Will do so.
Thank you! :)
Just for the records, I think we "can" allow users to provide not all but just some of the features in the parameter
Cool
This is exactly what I thought after pushing the last set of commits! I am planning to work on this. And, then, I will take care of the |
@rasbt |
Thanks for all the updates! On the one hand, I think the To keep it manageable right now, and to not overcomplicate things when we add from sklearn.neighbors import KNeighborsClassifier
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS
from sklearn.datasets import load_iris
import pandas as pd
iris_data = load_iris()
X, y = iris_data.data, iris_data.target
df_X = pd.DataFrame(X, columns=["Sepal length", "Sepal width", "Petal length", "Petal width"])
knn = KNeighborsClassifier(n_neighbors=3)
efs1 = EFS(knn,
min_features=1,
max_features=4,
scoring='accuracy',
print_progress=True,
cv=5)
efs1 = efs1.fit(df_X, y)
print('Best accuracy score: %.2f' % efs1.best_score_)
print('Best subset (indices):', efs1.best_idx_)
print('Best subset (corresponding names):', efs1.best_feature_names_)
|
Sure. Removing the feature makes sense. Thanks for your effort.
Sounds good :) |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Alright, just removed it to simplify the code. I may have to call it a day here, but I will be back with the code reviews in the next couple of days! |
@rasbt Also, if you have an idea about what I should do as the next step in this PR, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rasbt
I reviewed the code and left some comments. Please feel free to let me know if you disagree with some/all of them, and also please add your comments if you have any.
Thanks for the comments! I would say other than those you mentioned, it looks really good. I wouldn't worry about any additional changes to the code |
Cool! I will take care of the comments and push the changes. |
@rasbt |
Wow thanks so much, looks great now and will merge! |
@rasbt |
Likewise, thanks so much for the high-quality contribution @NimaSarajpoor. This is well appreciated! Actually, I may make a small release in the following days if time permits! |
Description
This PR addresses issue #956. The idea is to allow user to consider some features to be considered together throughout the feature selection process.
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./mlxtend -sv
and make sure that all CURRENT unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend
What I have done so far:
features_group
tomlxtend/feature_selection/exhaustive_feature_selector.py
feature_selection
passWhat needs to be done:
sequential_feature_selector.py
Please let me know what you think
side note:
I added some comments wherever I felt something should be noticed. We should remove the comments.