-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: New feature: Paired permutation test #768
Conversation
Sorry for getting back to it so late. I had a weirdly configured email client and haven't been getting Github notifications last month. Anyways, thanks for the PR, I really appreciate it! I think there were some issues with the methodology though. Coincidentally, I was teaching a stats course last semester (unfortunately, as per syllabus, I had to use R :P) where I implemented a paired permutation (randomization) test. As far as I know, the standard variant is just flipping the pairs. Below is my R reference implementation that matches the textbook results for a small toy dataset: pairedSamples<-function(diff,nSim=1000000){
d <- diff
ld <- length(d)
obsmeand <- mean(d)
meandvec <- rep(NA,nSim)
for (i in 1:nSim){
rsigns <- 2 * rbinom(ld, 1, .5) - 1
newd <- rsigns * d
newmeand <- mean(newd)
meandvec[i] <- newmeand
}
pl <- sum(meandvec >= obsmeand)
pg <- sum(meandvec <= obsmeand)
pval <- ifelse(pl < pg, 2 * pl/nSim, 2 * pg/nSim)
options(scipen=1)
if (pval==0){
pval <- paste("p <",1/nSim)
} else {
pval <- paste("p =",pval)
}
options(scipen=0)
hhh <- hist(meandvec, plot=FALSE)
hist(meandvec, xlab="Mean of differences",
main=paste("Paired Sample Randomization Test,", pval),
probability=TRUE, xlim=range(c(hhh$breaks, obsmeand)),
breaks=100, col="tan")
abline(v=obsmeand, col="red")
mtext(paste(round(obsmeand, digits=4)), side=1, at=obsmeand)
}
nineteen80 = c(3.67, 1.72, 3.46, 2.60, 2.03, 2.10, 3.01)
nineteen90 = c(2.11, 1.79, 2.71, 1.89, 1.69, 1.71, 2.01) For some reason, using the PR's code I couldn't reproduce these results. I think the problem were the arrays of different lengths. Tried to debug that but couldn't really figure out to fix it so I rewrote it using a more "naive" implementation (probably too many for-loops). However, it seems to work and reproduce the expected results now (added some unit tests). Please let me know your feedback! |
Well, that was embarrassing. My local version works fine, but somehow I forgot to add a np.where when doing the PR. My original implementation in the PR seems to output the right values after this change
instead of
In any case, the amended implementation you propose is much easier to follow. I checked times, and it is also a bit faster! |
No worries, and thanks for double-checking! Wow, I am positively surprised that this is actually not that slow. So, maybe we should keep the current version then ;). I add the docs. |
can you share full python code example pls |
I think this is example 3 here: http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/#example-3-paired-two-sample-randomization-test |
Code of Conduct
Description
I would like to add the option to perform permutation tests when the two populations are paired.
http://axon.cs.byu.edu/Dan/478/assignments/permutation_test.php
The current pull request is just a first proposal. If you think this feature is useful enough to be implemented in mlxtend, I can improve documentation/style and introduce non-trivial tests (currently it is not well tested, so it can still be buggy).
Related issues or pull requests
Fixes #767
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend