Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential Feaute Selection CPU utilization #191

Closed
whalebot-helmsman opened this issue May 14, 2017 · 6 comments
Closed

Sequential Feaute Selection CPU utilization #191

whalebot-helmsman opened this issue May 14, 2017 · 6 comments

Comments

@whalebot-helmsman
Copy link
Contributor

For multiple CPU utilization SFS uses sklearn.model_selection.cross_val_score. It runs one process per validation fold. Usually there are more features than folds. So, it is more scalable to use one process per feature.

@rasbt
Copy link
Owner

rasbt commented May 14, 2017

I agree, in typical use cases the number of features should be larger than the number of folds in CV, so the multiprocessing should run over the features not the folds. This should be relatively easy to change via the multiprocessing submodule or mputil, will take a look at that some time

@whalebot-helmsman
Copy link
Contributor Author

I want to make this enchancement, but unit test failed on my laptop #192

@rasbt
Copy link
Owner

rasbt commented May 14, 2017

Thanks for working on it. How did the unittests fail? Maybe just submit a PR and I could take a look! :)

@whalebot-helmsman
Copy link
Contributor Author

CPU per feature implementation gives results for various n_jobs, test.py in same gist.
Also, I added same parallel technique for exhaustive feature selection.

I can create new pull request today after fixing "Can't pickle <class 'method'>" for older pythons

@rasbt
Copy link
Owner

rasbt commented May 16, 2017

Thanks! I am looking forward to the PR and be happy to give you some feedback then.

I can create new pull request today after fixing "Can't pickle <class 'method'>" for older pythons

Which versions of Python would that be? Currently, the package supports the latest 2.7 version, 3.5, and 3.6. Is any of these causing problems? In general, if older Python versions (e.g., 2.6, 3.3, 3.4) work, that's nice, but I wouldn't make backwards compatible changes to support these if it adds to the code complexity

@whalebot-helmsman
Copy link
Contributor Author

whalebot-helmsman commented May 16, 2017

I met this problem on python 3.3.2. This is CentOS 6.7, kind of a target platform for me and system with 64 CPUs. Thats why it is so important.

There is two ways to fix this. None of them adds much to code complexity.
Fisrt, move SQS._calc_score to free-standing(non-member) function.
Second, add free-standing(non-member) function wrapper for SQS._calc_score.

I like first one more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants