Sequential Feaute Selection CPU utilization #191

whalebot-helmsman · 2017-05-14T08:19:44Z

For multiple CPU utilization SFS uses sklearn.model_selection.cross_val_score. It runs one process per validation fold. Usually there are more features than folds. So, it is more scalable to use one process per feature.

rasbt · 2017-05-14T15:30:14Z

I agree, in typical use cases the number of features should be larger than the number of folds in CV, so the multiprocessing should run over the features not the folds. This should be relatively easy to change via the multiprocessing submodule or mputil, will take a look at that some time

whalebot-helmsman · 2017-05-14T18:46:18Z

I want to make this enchancement, but unit test failed on my laptop #192

rasbt · 2017-05-14T20:25:46Z

Thanks for working on it. How did the unittests fail? Maybe just submit a PR and I could take a look! :)

whalebot-helmsman · 2017-05-16T13:45:38Z

CPU per feature implementation gives results for various n_jobs, test.py in same gist.
Also, I added same parallel technique for exhaustive feature selection.

I can create new pull request today after fixing "Can't pickle <class 'method'>" for older pythons

rasbt · 2017-05-16T14:18:39Z

Thanks! I am looking forward to the PR and be happy to give you some feedback then.

I can create new pull request today after fixing "Can't pickle <class 'method'>" for older pythons

Which versions of Python would that be? Currently, the package supports the latest 2.7 version, 3.5, and 3.6. Is any of these causing problems? In general, if older Python versions (e.g., 2.6, 3.3, 3.4) work, that's nice, but I wouldn't make backwards compatible changes to support these if it adds to the code complexity

whalebot-helmsman · 2017-05-16T15:08:22Z

I met this problem on python 3.3.2. This is CentOS 6.7, kind of a target platform for me and system with 64 CPUs. Thats why it is so important.

There is two ways to fix this. None of them adds much to code complexity.
Fisrt, move SQS._calc_score to free-standing(non-member) function.
Second, add free-standing(non-member) function wrapper for SQS._calc_score.

I like first one more.

…ure Selection (addressing #191) (#193)

rasbt added the Enhancement label May 14, 2017

rasbt mentioned this issue May 14, 2017

Unit test failed in mlxtend/plotting #192

Closed

whalebot-helmsman mentioned this issue May 16, 2017

Multiprocessing over features rather than CV folds in Sequential Feature Selection (addressing #191) #193

Merged

rasbt closed this as completed in #193 May 18, 2017

rasbt pushed a commit that referenced this issue May 18, 2017

Multiprocessing over features rather than CV folds in Sequential Feat…

1b0decf

…ure Selection (addressing #191) (#193)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequential Feaute Selection CPU utilization #191

Sequential Feaute Selection CPU utilization #191

whalebot-helmsman commented May 14, 2017

rasbt commented May 14, 2017 •

edited

Loading

whalebot-helmsman commented May 14, 2017

rasbt commented May 14, 2017

whalebot-helmsman commented May 16, 2017

rasbt commented May 16, 2017 •

edited

Loading

whalebot-helmsman commented May 16, 2017 •

edited

Loading

Sequential Feaute Selection CPU utilization #191

Sequential Feaute Selection CPU utilization #191

Comments

whalebot-helmsman commented May 14, 2017

rasbt commented May 14, 2017 • edited Loading

whalebot-helmsman commented May 14, 2017

rasbt commented May 14, 2017

whalebot-helmsman commented May 16, 2017

rasbt commented May 16, 2017 • edited Loading

whalebot-helmsman commented May 16, 2017 • edited Loading

rasbt commented May 14, 2017 •

edited

Loading

rasbt commented May 16, 2017 •

edited

Loading

whalebot-helmsman commented May 16, 2017 •

edited

Loading