-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified powerset function in EFS to reduce memory consumption #195
Conversation
I modified the calculation of the combination of candidates so that it is no longer stored in a list. This also required updating the calculation of the total number of combinations using an efficient ncr function. Tests are needed.
Hello @adam-erickson! Thanks for updating the PR.
Comment last updated on May 21, 2017 at 17:03 Hours UTC |
Thanks for the PR! I just gave it a quick glance and there's already something like E.g., you could replace all_comb = np.sum([ncr(n=X.shape[1], r=i)
for i in range(self.min_features,
self.max_features + 1)]) by from ..math import num_combinations
...
all_comb = np.sum([num_combinations(n=X.shape[1], k=i)
for i in range(self.min_features,
self.max_features + 1)]) |
Supposedly, the implementation provided is ~25% more efficient than math.factorial: http://stackoverflow.com/questions/4941753/is-there-a-math-ncr-function-in-python Update: I tested both on (n=100, r=10) and the one I provided was ~ 20% more efficient. I ran tests again on (n=10, r=5) and found the opposite, as your implementation was 3x faster here. However, the below version was twice as fast as yours for small datasets and also faster for large datasets:
I'd consider this version optimal. |
Interesting! However, I think any of these functions runs so quickly (compared to the rest) that speed is really not a concern here. Sorry my previous comment was a bit misleading; I primarily thought that resusing the existing num_combinations function would be a good idea to avoid reimplementing this code here (in the sense of refactoring or duplication of effort :)). |
Hi, @adam-erickson . Just revisiting this discussion from last weekend ... |
Hi @rasbt sorry for the slow reply, I was releasing gapfraction and its website. Of course, we can use the num_combinations function if you'd like for simplicity. What can I do to help facilitate these changes? I'm a bit new to this process. Should I close the pull request? |
No worries at all -- just wasn't sure if you'd okay with the proposed changes.
Feel free to make another commit to this PR to change the code using the existing
and add a note to the CHANGELOG.md file at ./docs/sources/CHANGELOG.md (in the bug fixes section). Since GitHub now has this convenient "Squash and Merge" button, we don't have to worry about "too many" commits since that's all being taken care of at the end. |
Hi @rabst, I did some testing and it appears that the results of the |
Thanks for looking into that, @adam-erickson . I also find the same issue, when I
But for n=300, I get
instead of
Will fix that |
Should be fixed now in #200 (but you could also use scipy.misc (didn't know |
Sorry, somehow, I forgot about this ... Merging now because the code is good and works perfectly fine :). Thanks for the contribution! |
Description
I modified the powerset function and calculation of number of possible combinations to reduce memory overhead by not longer storing all combinations as a list.
Related issues or pull requests
Fixes # #194
Pull Request requirements
./mlxtend/*/tests
directoriesnosetests ./mlxtend -sv
and make sure that all unit tests passnosetests ./mlxtend --with-coverage
flake8 ./mlxtend
./docs/sources/CHANGELOG.md
filemlxtend/docs/sources/
(optional)