-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QuadratiK: A Collection of Methods Using Kernel-Based Quadratic Distances for Statistical Inference and Clustering #632
Comments
@ropensci-review-bot check srr |
1 similar comment
@ropensci-review-bot check srr |
'srr' standards compliance:
✔️ This package complies with > 50% of all standads and may be submitted. |
Thanks for the submission @giovsaraceno ! I'm getting some advice from the other editors about your question. One thing that would be really helpful - could you push up your documentation to a GitHub page? From the |
Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:
So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower. We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission! |
Thanks @ldecicco-USGS for your guidance during this process. Following your suggestion, I've now pushed the documentation for the QuadratiK package to a GitHub page. You can find it displayed on the main page of the GitHub repository. Here's the direct link for easy access: QuadratiK package GitHub page. |
Hi Mark, Thank you for the additional clarification regarding the standards for Probability Distributions and their integration with other statistical software categories. Following your guidance, we have conducted a thorough review of the standards applicable to the Probability Distributions category in relation to our package. Based on our assessment, we found that the current version of our package satisfies 14% of the standards directly. Furthermore, we identified that an additional 36% of the standards could potentially apply to our package, but this would require us to make some enhancements, including the addition of checks and test codes. We feel the remaining 50% of the standards are not applicable to our package. We are committed to improve our package and aim to fulfill the applicable standards. To this end, we plan to work on a separate branch dedicated to implementing these enhancements, with the goal of meeting the 50% of the standards for the Probability Distributions category. Before proceeding, we would greatly appreciate your opinion on this plan. Thank you for your time and support. Giovanni |
Hi Mark, We addressed the enhancements we discussed, and our package now meets 50% of the standards for the Probability Distributions category. These updates are in the probability-distributions-standards branch of our repository. Thank you, Giovanni |
Hi Giovanni, your Those are very minor points which you may ignore for the moment if you'd like to get the review process started, or you could quickly address them straight away if you prefer. Either way, feel free to ask the bot to |
Hi, thank you for your suggestions on our compliance statements and testing practices. As for comparing results from different distributions, the rpkb function in our package provides options to generate random observations using three distinct algorithms based on different probability distributions. We've conducted tests to confirm that each method functions as intended. We added also a new vignette in which the methods are compared by graphically displaying the generated points. Is this what you are looking for? We're inclined to address them promptly. We would appreciate if we can get an answer to the questions posed above so that we can start the review process. |
Sorry we didn't reply faster, @giovsaraceno. In, say, a single-variable distribution tests might include:
|
Thanks @noamross for your explanation. We have taken your suggestions into consideration and have implemented them accordingly. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 The following problems were found in your submission template:
👋 |
Checks for QuadratiK (v1.0.0)git hash: 21541a40
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. rOpenSci Statistical Standards (
|
type | package | ncalls |
---|---|---|
internal | base | 382 |
internal | QuadratiK | 50 |
internal | utils | 10 |
internal | grDevices | 1 |
imports | stats | 29 |
imports | methods | 26 |
imports | sn | 14 |
imports | ggpp | 2 |
imports | cluster | 1 |
imports | mclust | 1 |
imports | moments | 1 |
imports | rrcov | 1 |
imports | clusterRepro | NA |
imports | doParallel | NA |
imports | foreach | NA |
imports | ggplot2 | NA |
imports | ggpubr | NA |
imports | MASS | NA |
imports | movMF | NA |
imports | mvtnorm | NA |
imports | Rcpp | NA |
imports | RcppEigen | NA |
imports | rgl | NA |
imports | rlecuyer | NA |
imports | Tinflex | NA |
suggests | knitr | NA |
suggests | rmarkdown | NA |
suggests | roxygen2 | NA |
suggests | testthat | NA |
linking_to | Rcpp | NA |
linking_to | RcppEigen | NA |
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.
base
list (46), data.frame (26), matrix (24), nrow (23), t (20), log (19), rep (19), ncol (18), c (14), numeric (12), for (11), sqrt (10), length (8), mean (8), as.numeric (6), return (6), sample (6), T (6), vapply (6), apply (5), as.factor (5), table (5), unique (5), as.vector (4), cumsum (4), exp (4), rbind (4), sum (4), as.matrix (3), kappa (3), lapply (3), lgamma (3), pi (3), q (3), replace (3), unlist (3), as.integer (2), diag (2), max (2), readline (2), rownames (2), rowSums (2), which (2), which.max (2), with (2), beta (1), colMeans (1), expand.grid (1), F (1), factor (1), if (1), levels (1), norm (1), rep.int (1), round (1), seq_len (1), subset (1)
QuadratiK
DOF (3), kbNormTest (3), normal_CV (3), C_d_lambda (2), compute_CV (2), cv_ksample (2), d2lpdf (2), dlpdf (2), lpdf (2), norm_vec (2), objective_norm (2), poisson_CV (2), rejvmf (2), sample_hypersphere (2), statPoissonUnif (2), compare_qq (1), compute_stats (1), computeKernelMatrix (1), computePoissonMatrix (1), dpkb (1), elbowMethod (1), generate_SN (1), NonparamCentering (1), objective_2 (1), objective_k (1), ParamCentering (1), pkbc_validation (1), rejacg (1), rejpsaw (1), select_h (1), stat_ksample_cpp (1), stat2sample (1)
stats
df (12), quantile (4), dist (2), rnorm (2), runif (2), aggregate (1), cov (1), D (1), qchisq (1), sd (1), sigma (1), uniroot (1)
methods
setMethod (12), setGeneric (8), new (3), setClass (3)
sn
rmsn (14)
utils
data (8), prompt (2)
ggpp
annotate (2)
cluster
silhouette (1)
grDevices
colorRampPalette (1)
mclust
adjustedRandIndex (1)
moments
skewness (1)
rrcov
PcaLocantore (1)
NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.
3. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in C++ (17% in 2 files) and R (83% in 12 files)
- 4 authors
- 5 vignettes
- 1 internal data file
- 21 imported packages
- 24 exported functions (median 14 lines of code)
- 56 non-exported functions in R (median 16 lines of code)
- 16 R functions (median 13 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:
loc
= "Lines of Code"fn
= "function"exp
/not_exp
= exported / not exported
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown()
function
The final measure (fn_call_network_size
) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
measure | value | percentile | noteworthy |
---|---|---|---|
files_R | 12 | 65.5 | |
files_src | 2 | 79.1 | |
files_vignettes | 5 | 96.9 | |
files_tests | 10 | 90.7 | |
loc_R | 1408 | 76.6 | |
loc_src | 281 | 34.1 | |
loc_vignettes | 235 | 55.3 | |
loc_tests | 394 | 70.0 | |
num_vignettes | 5 | 97.9 | TRUE |
data_size_total | 11842 | 71.9 | |
data_size_median | 11842 | 80.1 | |
n_fns_r | 80 | 70.4 | |
n_fns_r_exported | 24 | 72.5 | |
n_fns_r_not_exported | 56 | 70.6 | |
n_fns_src | 16 | 40.4 | |
n_fns_per_file_r | 5 | 67.1 | |
n_fns_per_file_src | 8 | 69.1 | |
num_params_per_fn | 5 | 69.6 | |
loc_per_fn_r | 15 | 46.1 | |
loc_per_fn_r_exp | 14 | 35.1 | |
loc_per_fn_r_not_exp | 16 | 54.8 | |
loc_per_fn_src | 13 | 41.6 | |
rel_whitespace_R | 24 | 82.7 | |
rel_whitespace_src | 18 | 36.2 | |
rel_whitespace_vignettes | 16 | 29.2 | |
rel_whitespace_tests | 34 | 78.1 | |
doclines_per_fn_exp | 50 | 62.8 | |
doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
fn_call_network_size | 50 | 66.3 |
3a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
4. goodpractice
and other checks
Details of goodpractice checks (click to open)
3a. Continuous Integration Badges
(There do not appear to be any)
GitHub Workflow Results
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
8851531581 | pages build and deployment | success | 21541a | 25 | 2024-04-26 |
8851531648 | pkgcheck | failure | 21541a | 60 | 2024-04-26 |
8851531643 | pkgdown | success | 21541a | 25 | 2024-04-26 |
8851531649 | R-CMD-check | success | 21541a | 83 | 2024-04-26 |
8851531642 | test-coverage | success | 21541a | 83 | 2024-04-26 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following warning:
- checking whether package ‘QuadratiK’ can be installed ... WARNING
Found the following significant warnings:
Warning: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'.
See ‘/tmp/RtmpQrtXuf/file133861d90686/QuadratiK.Rcheck/00install.out’ for details.
R CMD check generated the following note:
- checking installed package size ... NOTE
installed size is 16.6Mb
sub-directories of 1Mb or more:
libs 15.0Mb
R CMD check generated the following check_fails:
- no_import_package_as_a_whole
- rcmdcheck_examples_run_without_warnings
- rcmdcheck_significant_compilation_warnings
- rcmdcheck_reasonable_installed_size
Test coverage with covr
Package coverage: 78.21
Cyclocomplexity with cyclocomp
The following function have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
select_h | 46 |
Static code analyses with lintr
lintr found the following 20 potential issues:
message | number of times |
---|---|
Avoid library() and require() calls in packages | 9 |
Lines should not be more than 80 characters. | 9 |
Use <-, not =, for assignment. | 2 |
5. Other Checks
Details of other checks (click to open)
✖️ Package contains the following unexpected files:
- src/RcppExports.o
- src/kernel_function.o
✖️ The following function name is duplicated in other packages:
-
extract_stats
from ggstatsplot
Package Versions
package | version |
---|---|
pkgstats | 0.1.3.13 |
pkgcheck | 0.1.2.21 |
srr | 0.1.2.9 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
We have solved all the marked items and we are now ready to request the automatic bot check. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 The following problems were found in your submission template:
👋 |
Hi @jooolia, |
@jooolia The automated checks failed because of issue linked to above. @giovsaraceno When you've fixed this issue and confirmed that pkgcheck workflows once again succeed in your repo, please call |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 The following problems were found in your submission template:
👋 |
@emitanaka added to the reviewers list. Review due date is 2025-02-05. Thanks @emitanaka for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@emitanaka: If you haven't done so, please fill this form for us to update our reviewers records. |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Estimated hours spent reviewing: 10 hours
Review CommentsThank you for your patience @giovsaraceno. This package is very well documented and is a great contribution to goodness-of-fit testing and clustering for spherical data. I've made a pull request to ensure that the styling and spelling errors are mostly corrected. I note that I too like to recommend for the gold badge, however, just a few minor points if you could address (shown below).
|
Dear @emitanaka, Thank you for your thorough review, with the suggestions and changes provided for improving the package, and for recommending the package for the gold badge.
|
Dear @giovsaraceno this is to mark the start of my EiC rotation. I'm reviewing all open issues and noting what I see. I see this submission is a bit ahead of it's labels. The labels suggest the reviews are pending but they are both done. Also I see the author responded to both reviews here and here. Next steps:
|
@ropensci-review-bot submit review #632 (comment) time 10 |
Logged review for emitanaka (hours: 10) |
@ropensci-review-bot submit review #632 (comment) time 6 |
Logged review for kasselhingee (hours: 6) |
Reviewer ResponseFinal approval (post-review)
Estimated hours spent reviewing: 16 hours |
Reviewer ResponseFinal approval (post-review)
Estimated hours spent reviewing: 10.5 hours |
@ropensci-review-bot approve QuadratiK |
Approved! Thanks @giovsaraceno for submitting and @kasselhingee, @emitanaka for your reviews! 😁 To-dos:
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions. We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved. Last but not least, you can volunteer as a reviewer via filling a short form. |
@ropensci-review-bot finalize transfer of giovsaraceno/QuadratiK-package |
Can't find repository |
@ropensci-review-bot finalize transfer of QuadratiK-package |
Transfer completed. |
Dear @maurolepore, thanks for accepting the package in ropensci. I have now followed the to-dos listed in the last comment by the ropensci-review-bot. In particular,
I tried to follow the steps for changing the automatic deployment of pkgdown. At the moment the webpage at the new address is not created. I would like to ask if I have just wait or if there is something missing. |
@giovsaraceno, great! The credit goes to @emitanaka who might be able to help debug the website? But while I'm here ... I see the URL has an odd Compare: https://github.com/orgs/ropensci/repositories |
Thanks @maurolepore, I have changed the repository name from |
Date accepted: 2025-02-03
Submitting Author Giovanni Saraceno
Submitting Author Github Handle: @giovsaraceno
Other Package Authors Github handles: @rmj3197
Repository: https://github.com/giovsaraceno/QuadratiK-package§
Version submitted:1.1.1
Submission type: Stats
Badge grade: gold
Editor: @emitanaka
Reviewers: @kasselhingee, @emitanaka
Archive: TBD
Version accepted: TBD
Scope
Data Lifecycle Packages
Statistical Packages
Bayesian and Monte Carlo Routines
Dimensionality Reduction, Clustering, and Unsupervised Learning
Machine Learning
Regression and Supervised Learning
Exploratory Data Analysis (EDA) and Summary Statistics
Spatial Analyses
Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
This category is the most suitable due to QuadratiK's clustering technique, specifically designed for spherical data. The package's clustering algorithm falls within the realm of unsupervised learning, where the focus is on identifying groupings in the data without pre-labeled categories. The two- and k-sample tests serve as additional tools for testing the differences between the identified groups.
Following the link https://stats-devguide.ropensci.org/standards.html we noticed in the "Table of contents" that category 6.9 refers to Probability Distribution. We are unsure how we fit and if we fit this category. Can you please advise?
Yes, we have incorporated documentation of standards into our QuadratiK package by utilizing the srr package, considering the categories "General" and "Dimensionality Reduction, Clustering, and Unsupervised Learning", in line with the recommendations provided in the rOpenSci Statistical Software Peer Review Guide.
The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions. Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data. Furthermore, this package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology. Moreover, its implementation in both R and Python broadens its accessibility, catering to a wide audience accustomed to these popular programming languages.
Yes, there are other R packages that address goodness-of-fit (GoF) testing and multivariate analysis. Notable among these are the energy package for energy statistics-based tests. The function kmmd in the kernlab package offers a kernel-based test which has similar mathematical formulation. The package sphunif provides all the tests for uniformity on the sphere available in literature. The list of implemented tests includes the test for uniformity based on the Poisson kernel. However, there are fundamental differences between the methods encoded in the aforementioned packages and those offered in the QuadratiK package.
QuadratiK uniquely focuses on kernel-based quadratic distances methods for GoF testing, offering a comprehensive set of tools for one-sample, two-sample, and k-sample tests. This specialization provides more nuanced and robust methodologies for statistical analysis, especially in complex multivariate contexts. QuadratiK is optimized for high-dimensional datasets, employing efficient C++ implementations. This makes it particularly suitable for contemporary large-scale data analysis challenges. The package introduces advanced methods for kernel centering and critical value computation, as well as optimal tuning parameter selection based on midpower analysis. QuadratiK includes a unique clustering algorithm for spherical data. These innovations are not covered in other available packages. With implementations in both R and Python, QuadratiK appeals to a wider audience across different programming communities. We also provide a user-friendly dashboard application which further enhances accessibility, catering to users with varying levels of statistical and programming expertise.
In summary there are fundamental differences between QuadratiK and all existing R packages:
Yes, our package, QuadratiK, is compliant with the rOpenSci guidelines on Ethics, Data Privacy, and Human Subjects Research. We have carefully considered and adhered to ethical standards and data privacy laws relevant to our work.
Please see the question posed in the first bullet.
The text was updated successfully, but these errors were encountered: