QuadratiK: A Collection of Methods Using Kernel-Based Quadratic Distances for Statistical Inference and Clustering #632

giovsaraceno · 2024-03-13T22:53:53Z

Date accepted: 2025-02-03
Submitting Author Giovanni Saraceno
Submitting Author Github Handle: @giovsaraceno
Other Package Authors Github handles: @rmj3197
Repository: https://github.com/giovsaraceno/QuadratiK-package§
Version submitted:1.1.1
Submission type: Stats
Badge grade: gold
Editor: @emitanaka
Reviewers: @kasselhingee, @emitanaka

Archive: TBD
Version accepted: TBD

DESCRIPTION file:

Type: Package
Package: QuadratiK
Title: A Collection of Methods Using Kernel-Based Quadratic Distances for 
       Statistical Inference and Clustering
Version: 1.0.0
Authors@R: c(
person("Giovanni", "Saraceno", , "[email protected]", role = c("aut", "cre"),
comment = "ORCID 000-0002-1753-2367"),
person("Marianthi", "Markatou", role = "aut"),
person("Raktim", "Mukhopadhyay", role = "aut"),
person("Mojgan", "Golzy", role = "ctb")
)
Maintainer: Giovanni Saraceno <[email protected]>
Description: The package includes test for multivariate normality, test for
uniformity on the Sphere, non-parametric two- and k-sample tests,
random generation of points from the Poisson kernel-based density and a
clustering algorithm for spherical data. For more information see
Saraceno, G., Markatou, M., Mukhopadhyay, R., Golzy, M. (2024)
<arXiv:2402.02290>, Ding, Y., Markatou, M., Saraceno, G. (2023)
<doi:10.5705/ss.202022.0347>, and Golzy, M., Markatou, M. (2020)
<doi:10.1080/10618600.2020.1740713>.
License: GPL (>= 3)
URL: https://cran.r-project.org/web/packages/QuadratiK/index.html, 
     https://github.com/giovsaraceno/QuadratiK-package
BugReports: https://github.com/giovsaraceno/QuadratiK-package/issues
Depends: 
R (>= 3.5.0)
Imports: 
cluster,
clusterRepro,
doParallel,
foreach,
ggplot2,
ggpp,
ggpubr,
MASS,
mclust,
methods,
moments,
movMF,
mvtnorm,
Rcpp,
RcppEigen,
rgl,
rlecuyer,
rrcov,
sn,
stats,
Tinflex
Suggests: 
knitr,
rmarkdown,
roxygen2,
testthat (>= 3.0.0)
LinkingTo: 
Rcpp,
RcppEigen
VignetteBuilder: 
knitr
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown=TRUE, roclets=c("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.2.3

Scope

Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):

Data Lifecycle Packages

Statistical Packages

Bayesian and Monte Carlo Routines
Dimensionality Reduction, Clustering, and Unsupervised Learning
Machine Learning
Regression and Supervised Learning
Exploratory Data Analysis (EDA) and Summary Statistics
Spatial Analyses
Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

This category is the most suitable due to QuadratiK's clustering technique, specifically designed for spherical data. The package's clustering algorithm falls within the realm of unsupervised learning, where the focus is on identifying groupings in the data without pre-labeled categories. The two- and k-sample tests serve as additional tools for testing the differences between the identified groups.
Following the link https://stats-devguide.ropensci.org/standards.html we noticed in the "Table of contents" that category 6.9 refers to Probability Distribution. We are unsure how we fit and if we fit this category. Can you please advise?

If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?

Yes, we have incorporated documentation of standards into our QuadratiK package by utilizing the srr package, considering the categories "General" and "Dimensionality Reduction, Clustering, and Unsupervised Learning", in line with the recommendations provided in the rOpenSci Statistical Software Peer Review Guide.

Who is the target audience and what are scientific applications of this package?

The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions. Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data. Furthermore, this package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology. Moreover, its implementation in both R and Python broadens its accessibility, catering to a wide audience accustomed to these popular programming languages.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Yes, there are other R packages that address goodness-of-fit (GoF) testing and multivariate analysis. Notable among these are the energy package for energy statistics-based tests. The function kmmd in the kernlab package offers a kernel-based test which has similar mathematical formulation. The package sphunif provides all the tests for uniformity on the sphere available in literature. The list of implemented tests includes the test for uniformity based on the Poisson kernel. However, there are fundamental differences between the methods encoded in the aforementioned packages and those offered in the QuadratiK package.

QuadratiK uniquely focuses on kernel-based quadratic distances methods for GoF testing, offering a comprehensive set of tools for one-sample, two-sample, and k-sample tests. This specialization provides more nuanced and robust methodologies for statistical analysis, especially in complex multivariate contexts. QuadratiK is optimized for high-dimensional datasets, employing efficient C++ implementations. This makes it particularly suitable for contemporary large-scale data analysis challenges. The package introduces advanced methods for kernel centering and critical value computation, as well as optimal tuning parameter selection based on midpower analysis. QuadratiK includes a unique clustering algorithm for spherical data. These innovations are not covered in other available packages. With implementations in both R and Python, QuadratiK appeals to a wider audience across different programming communities. We also provide a user-friendly dashboard application which further enhances accessibility, catering to users with varying levels of statistical and programming expertise.

In summary there are fundamental differences between QuadratiK and all existing R packages:

The goodness-of-fit tests are U-statistics based on centered kernels. The concept and methodology of centering is novel and unique to our methods and is not part of the methods of other existing packages.
An algorithm for connecting the tuning parameter with the statistical properties of the test, namely power and degrees of freedom of the kernel (DOF) is provided. This feature differentiates our novel methods from all encoded methods in the aforementioned R packages.
A new clustering algorithm for data that reside on the sphere is offered. This aspect is not a feature of existing packages.
We also offer algorithms for generating random samples from Poisson kernel-based densities. This capability is also unique to our package.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Yes, our package, QuadratiK, is compliant with the rOpenSci guidelines on Ethics, Data Privacy, and Human Subjects Research. We have carefully considered and adhered to ethical standards and data privacy laws relevant to our work.

Any other questions or issues we should be aware of?:

Please see the question posed in the first bullet.

The text was updated successfully, but these errors were encountered:

ldecicco-USGS · 2024-03-15T20:13:53Z

@ropensci-review-bot check srr

maelle · 2024-03-18T07:59:53Z

@ropensci-review-bot check srr

ropensci-review-bot · 2024-03-18T08:15:09Z

'srr' standards compliance:

Complied with: 57 / 101 = 56.4% (general: 37 / 68; unsupervised: 20 / 33)
Not complied with: 44 / 101 = 43.6% (general: 31 / 68; unsupervised: 13 / 33)

✔️ This package complies with > 50% of all standads and may be submitted.

ldecicco-USGS · 2024-03-19T21:15:55Z

Thanks for the submission @giovsaraceno ! I'm getting some advice from the other editors about your question. One thing that would be really helpful - could you push up your documentation to a GitHub page?

From the usethis package, there's a function that helps setting it up:
https://usethis.r-lib.org/reference/use_github_pages.html

mpadge · 2024-03-20T09:07:14Z

Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:

Unlike most other categories of standards, packages which fit in this category will also generally be expected to fit into at least one other category of statistical software. Reflecting that expectation, standards for probability distributions will be expected to only pertain to some (potentially small) portion of code in any package.

So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower.

We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission!

giovsaraceno · 2024-03-22T18:19:19Z

Thanks for the submission @giovsaraceno ! I'm getting some advice from the other editors about your question. One thing that would be really helpful - could you push up your documentation to a GitHub page?

From the usethis package, there's a function that helps setting it up: https://usethis.r-lib.org/reference/use_github_pages.html

Thanks @ldecicco-USGS for your guidance during this process. Following your suggestion, I've now pushed the documentation for the QuadratiK package to a GitHub page. You can find it displayed on the main page of the GitHub repository. Here's the direct link for easy access: QuadratiK package GitHub page.

giovsaraceno · 2024-03-22T19:53:04Z

Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:

Unlike most other categories of standards, packages which fit in this category will also generally be expected to fit into at least one other category of statistical software. Reflecting that expectation, standards for probability distributions will be expected to only pertain to some (potentially small) portion of code in any package.

So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower.

We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission!

Hi Mark,

Thank you for the additional clarification regarding the standards for Probability Distributions and their integration with other statistical software categories. Following your guidance, we have conducted a thorough review of the standards applicable to the Probability Distributions category in relation to our package.

Based on our assessment, we found that the current version of our package satisfies 14% of the standards directly. Furthermore, we identified that an additional 36% of the standards could potentially apply to our package, but this would require us to make some enhancements, including the addition of checks and test codes. We feel the remaining 50% of the standards are not applicable to our package.

We are committed to improve our package and aim to fulfill the applicable standards. To this end, we plan to work on a separate branch dedicated to implementing these enhancements, with the goal of meeting the 50% of the standards for the Probability Distributions category. Before proceeding, we would greatly appreciate your opinion on this plan.

Thank you for your time and support. Giovanni

giovsaraceno · 2024-03-26T19:45:30Z

Hi @giovsaraceno, Mark here from the rOpenSci stats team to answer your question. We've done our best to clarify the role of Probability Distributions Standards:

Unlike most other categories of standards, packages which fit in this category will also generally be expected to fit into at least one other category of statistical software. Reflecting that expectation, standards for probability distributions will be expected to only pertain to some (potentially small) portion of code in any package.

So packages should generally fit within some main category, with Probability Distributions being an additional category. In your case, Dimensionality Reduction seems like the appropriate main category, but it seems like your package would also fit within Probability Distributions. Given that, the next step would be for you to estimate what proportion of those standards you think might apply to your package? Our general rule-of-thumb is that at least 50% should apply, but for Probability Distributions as an additional category, that figure may be lower.
We are particularly keen to document compliance with this category, because it is where our standards have a large overlap with many core routines of the R language itself. As always, we encourage feedback on our standards, so please also feel very welcome to open issues in the Stats Software repository, or add comments or questions in the discussion pages. Thanks for you submission!

Hi Mark,

Thank you for the additional clarification regarding the standards for Probability Distributions and their integration with other statistical software categories. Following your guidance, we have conducted a thorough review of the standards applicable to the Probability Distributions category in relation to our package.

Based on our assessment, we found that the current version of our package satisfies 14% of the standards directly. Furthermore, we identified that an additional 36% of the standards could potentially apply to our package, but this would require us to make some enhancements, including the addition of checks and test codes. We feel the remaining 50% of the standards are not applicable to our package.

We are committed to improve our package and aim to fulfill the applicable standards. To this end, we plan to work on a separate branch dedicated to implementing these enhancements, with the goal of meeting the 50% of the standards for the Probability Distributions category. Before proceeding, we would greatly appreciate your opinion on this plan.

Thank you for your time and support. Giovanni

Hi Mark,

We addressed the enhancements we discussed, and our package now meets 50% of the standards for the Probability Distributions category. These updates are in the probability-distributions-standards branch of our repository.
We would like your opinion on merging this branch with the submitted version of the package.

Thank you, Giovanni

mpadge · 2024-03-27T10:07:11Z

Hi Giovanni, your srrstats tags for probability distribution standards definitely look good enough to proceed. That said, one aspect which could be improved, and which I would request if I were reviewing the package, is the compliance statements in the tests. In both test-dpkb.R and test-rkpb.R you claim compliance in single statements at the start, yet I can't really see where or how a few of these are really complied with. In particular, there do not appear to be explicit tests for output values, as these are commonly tested using test_equal with an explicit tolerance parameter, which you don't have. It is also not clear to me where and how you compare results of different distributions, because you have no annotations in the tests about what the return values of the functions are.

Those are very minor points which you may ignore for the moment if you'd like to get the review process started, or you could quickly address them straight away if you prefer. Either way, feel free to ask the bot to check srr when you think you're ready to proceed. Thanks!

giovsaraceno · 2024-03-28T20:16:14Z

Hi, thank you for your suggestions on our compliance statements and testing practices.
Regarding the explicit testing for output values and the use of test_equal with a tolerance parameter, we aimed to ensure that our functions return the expected outputs. However, we recognize that our current tests may not explicitly demonstrate compliance with this standard in the way you've described. We're uncertain about the best approach to incorporate test_equal with a tolerance parameter effectively, for testing the numeric equality of outputs from the provided random generation and density functions. Can you provide some tips?

As for comparing results from different distributions, the rpkb function in our package provides options to generate random observations using three distinct algorithms based on different probability distributions. We've conducted tests to confirm that each method functions as intended. We added also a new vignette in which the methods are compared by graphically displaying the generated points. Is this what you are looking for?

We're inclined to address them promptly. We would appreciate if we can get an answer to the questions posed above so that we can start the review process.
Thanks, Giovanni

noamross · 2024-04-10T16:58:48Z

Sorry we didn't reply faster, @giovsaraceno. In, say, a single-variable distribution tests might include:

A correctness that the density function with given parameters has means, modes, or variances as theoretically expected.
A parameter recovery that the mean of a sufficiently large number of randomly generated values is within a window of expectations.
In your case my understanding is that you are generating multivariate outputs. Ultimately we aim to see tests that those outputs are as expected, so for both density and random values. I think the thing to do is test that summary properties of those outputs, deterministic for density and within bounds for random, match those expected based on the input parameters

giovsaraceno · 2024-05-01T19:19:23Z

Thanks @noamross for your explanation. We have taken your suggestions into consideration and have implemented them accordingly.
We are now ready to request the automatic bot check for our package. We look forward to any further instructions or feedback that might come from this next step.

ldecicco-USGS · 2024-05-01T20:04:04Z

@ropensci-review-bot check package

ropensci-review-bot · 2024-05-01T20:04:06Z

Thanks, about to send the query.

ropensci-review-bot · 2024-05-01T20:04:09Z

🚀

The following problems were found in your submission template:

HTML variable [editor] is missing
HTML variable [reviewers-list] is missing
HTML variable [due-dates-list] is missing
Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

👋

ropensci-review-bot · 2024-05-01T20:25:39Z

Checks for QuadratiK (v1.0.0)

git hash: 21541a40

✔️ Package is already on CRAN.
✔️ has a 'codemeta.json' file.
✔️ has a 'contributing' file.
✔️ uses 'roxygen2'.
✔️ 'DESCRIPTION' has a URL field.
✔️ 'DESCRIPTION' has a BugReports field.
✔️ Package has at least one HTML vignette
✔️ All functions have examples.
✔️ Package has continuous integration checks.
✔️ Package coverage is 78.2%.
✖️ Package contains unexpected files.
✔️ R CMD check found no errors.
✖️ R CMD check found 1 warning.
👀 Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with 👀 may be optionally addressed.)

Package License: GPL (>= 3)

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Dimensionality Reduction, Clustering and Unsupervised Learning

✔️ All applicable standards [v0.2.0] have been documented in this package (204 complied with; 49 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.

2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type	package	ncalls
internal	base	382
internal	QuadratiK	50
internal	utils	10
internal	grDevices	1
imports	stats	29
imports	methods	26
imports	sn	14
imports	ggpp	2
imports	cluster	1
imports	mclust	1
imports	moments	1
imports	rrcov	1
imports	clusterRepro	NA
imports	doParallel	NA
imports	foreach	NA
imports	ggplot2	NA
imports	ggpubr	NA
imports	MASS	NA
imports	movMF	NA
imports	mvtnorm	NA
imports	Rcpp	NA
imports	RcppEigen	NA
imports	rgl	NA
imports	rlecuyer	NA
imports	Tinflex	NA
suggests	knitr	NA
suggests	rmarkdown	NA
suggests	roxygen2	NA
suggests	testthat	NA
linking_to	Rcpp	NA
linking_to	RcppEigen	NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (46), data.frame (26), matrix (24), nrow (23), t (20), log (19), rep (19), ncol (18), c (14), numeric (12), for (11), sqrt (10), length (8), mean (8), as.numeric (6), return (6), sample (6), T (6), vapply (6), apply (5), as.factor (5), table (5), unique (5), as.vector (4), cumsum (4), exp (4), rbind (4), sum (4), as.matrix (3), kappa (3), lapply (3), lgamma (3), pi (3), q (3), replace (3), unlist (3), as.integer (2), diag (2), max (2), readline (2), rownames (2), rowSums (2), which (2), which.max (2), with (2), beta (1), colMeans (1), expand.grid (1), F (1), factor (1), if (1), levels (1), norm (1), rep.int (1), round (1), seq_len (1), subset (1)

QuadratiK

DOF (3), kbNormTest (3), normal_CV (3), C_d_lambda (2), compute_CV (2), cv_ksample (2), d2lpdf (2), dlpdf (2), lpdf (2), norm_vec (2), objective_norm (2), poisson_CV (2), rejvmf (2), sample_hypersphere (2), statPoissonUnif (2), compare_qq (1), compute_stats (1), computeKernelMatrix (1), computePoissonMatrix (1), dpkb (1), elbowMethod (1), generate_SN (1), NonparamCentering (1), objective_2 (1), objective_k (1), ParamCentering (1), pkbc_validation (1), rejacg (1), rejpsaw (1), select_h (1), stat_ksample_cpp (1), stat2sample (1)

stats

df (12), quantile (4), dist (2), rnorm (2), runif (2), aggregate (1), cov (1), D (1), qchisq (1), sd (1), sigma (1), uniroot (1)

methods

setMethod (12), setGeneric (8), new (3), setClass (3)

sn

rmsn (14)

utils

data (8), prompt (2)

ggpp

annotate (2)

cluster

silhouette (1)

grDevices

colorRampPalette (1)

mclust

adjustedRandIndex (1)

moments

skewness (1)

rrcov

PcaLocantore (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in C++ (17% in 2 files) and R (83% in 12 files)
4 authors
5 vignettes
1 internal data file
21 imported packages
24 exported functions (median 14 lines of code)
56 non-exported functions in R (median 16 lines of code)
16 R functions (median 13 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	12	65.5
files_src	2	79.1
files_vignettes	5	96.9
files_tests	10	90.7
loc_R	1408	76.6
loc_src	281	34.1
loc_vignettes	235	55.3
loc_tests	394	70.0
num_vignettes	5	97.9	TRUE
data_size_total	11842	71.9
data_size_median	11842	80.1
n_fns_r	80	70.4
n_fns_r_exported	24	72.5
n_fns_r_not_exported	56	70.6
n_fns_src	16	40.4
n_fns_per_file_r	5	67.1
n_fns_per_file_src	8	69.1
num_params_per_fn	5	69.6
loc_per_fn_r	15	46.1
loc_per_fn_r_exp	14	35.1
loc_per_fn_r_not_exp	16	54.8
loc_per_fn_src	13	41.6
rel_whitespace_R	24	82.7
rel_whitespace_src	18	36.2
rel_whitespace_vignettes	16	29.2
rel_whitespace_tests	34	78.1
doclines_per_fn_exp	50	62.8
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	50	66.3

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

4. `goodpractice` and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

(There do not appear to be any)

GitHub Workflow Results

id	name	conclusion	sha	run_number	date
8851531581	pages build and deployment	success	21541a	25	2024-04-26
8851531648	pkgcheck	failure	21541a	60	2024-04-26
8851531643	pkgdown	success	21541a	25	2024-04-26
8851531649	R-CMD-check	success	21541a	83	2024-04-26
8851531642	test-coverage	success	21541a	83	2024-04-26

3b. `goodpractice` results

`R CMD check` with rcmdcheck

R CMD check generated the following warning:

checking whether package ‘QuadratiK’ can be installed ... WARNING
Found the following significant warnings:
Warning: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'.
See ‘/tmp/RtmpQrtXuf/file133861d90686/QuadratiK.Rcheck/00install.out’ for details.

R CMD check generated the following note:

checking installed package size ... NOTE
installed size is 16.6Mb
sub-directories of 1Mb or more:
libs 15.0Mb

R CMD check generated the following check_fails:

no_import_package_as_a_whole
rcmdcheck_examples_run_without_warnings
rcmdcheck_significant_compilation_warnings
rcmdcheck_reasonable_installed_size

Test coverage with covr

Package coverage: 78.21

Cyclocomplexity with cyclocomp

The following function have cyclocomplexity >= 15:

function	cyclocomplexity
select_h	46

Static code analyses with lintr

lintr found the following 20 potential issues:

message	number of times
Avoid library() and require() calls in packages	9
Lines should not be more than 80 characters.	9
Use <-, not =, for assignment.	2

5. Other Checks

Details of other checks (click to open)

✖️ Package contains the following unexpected files:

src/RcppExports.o
src/kernel_function.o

✖️ The following function name is duplicated in other packages:

- extract_stats from ggstatsplot

Package Versions

package	version
pkgstats	0.1.3.13
pkgcheck	0.1.2.21
srr	0.1.2.9

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

giovsaraceno · 2024-05-13T22:23:20Z

We have solved all the marked items and we are now ready to request the automatic bot check.
Thanks

jooolia · 2024-05-29T18:32:32Z

@ropensci-review-bot check package

ropensci-review-bot · 2024-05-29T18:35:37Z

Thanks, about to send the query.

ropensci-review-bot · 2024-05-29T18:35:40Z

🚀

The following problems were found in your submission template:

HTML variable [editor] is missing
HTML variable [reviewers-list] is missing
HTML variable [due-dates-list] is missing
Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

👋

giovsaraceno · 2024-05-29T20:08:30Z

Hi @jooolia,
thanks for checking the package. Can you give us indications on how we should address the listed problems?
At the moment, we do not know which information to insert in the mentioned fields (editor, reviewers and due-dates list).
Thanks in advance

mpadge · 2024-05-31T09:18:34Z

@jooolia The automated checks failed because of issue linked to above. @giovsaraceno When you've fixed this issue and confirmed that pkgcheck workflows once again succeed in your repo, please call @ropensci-review-bot check package here to run checks again. Thanks

giovsaraceno · 2024-06-04T13:33:55Z

@ropensci-review-bot check package

ropensci-review-bot · 2024-06-04T13:33:57Z

Thanks, about to send the query.

ropensci-review-bot · 2024-06-04T13:34:00Z

🚀

The following problems were found in your submission template:

HTML variable [editor] is missing
HTML variable [reviewers-list] is missing
HTML variable [due-dates-list] is missing
Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

👋

ropensci-review-bot · 2025-01-15T02:24:02Z

@emitanaka added to the reviewers list. Review due date is 2025-02-05. Thanks @emitanaka for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot · 2025-01-15T02:24:07Z

@emitanaka: If you haven't done so, please fill this form for us to update our reviewers records.

emitanaka · 2025-01-29T06:26:36Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need: clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s): demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples: (that run successfully locally) for all exported functions
Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software have been confirmed.
Performance: Any performance claims of the software have been confirmed.
Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 10 hours

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Thank you for your patience @giovsaraceno. This package is very well documented and is a great contribution to goodness-of-fit testing and clustering for spherical data.

I've made a pull request to ensure that the styling and spelling errors are mostly corrected.

I note that test-kb.test.R and test-select_h.R hangs when I use devtools::test() or devtools::test_active_file()
but runs fine when I run the test individually in local environment. I'm not sure why and am presuming it is okay since it runs locally.

I too like to recommend for the gold badge, however, just a few minor points if you could address (shown below).

In test-kb.test.R -- test 7 (testing main functionality of k-sample test) and test 8 (testing selection of h) -- I see that the tests are looking at the return classes but there is no test for the actual values (test-statistic, critical value calculation, h, etc). I think you should add tests for these values as well as per G5.4 Correctness tests?
The function pkbc has assertions for the input nClust, however, no assertions of single numeric value appear to be made for maxIter and numInit.
I don't think below is done?

Standards on line#154 of file R/clustering_functions.R:
UL2.3 Unsupervised Learning Software should implement pre-processing routines to identify whether aspects of input data are perfectly collinear.*

Below, I don't think there are labels for cluster groups that can be supplied?

Standards on line#574 of file R/clustering_functions.R: - stats_clusters()

UL3.2 Unsupervised Learning Software for which input data does not generally include labels (such as array-like data with no row names) should provide an additional parameter to enable cases to be labelled.*

giovsaraceno · 2025-01-31T16:57:31Z

Dear @emitanaka,

Thank you for your thorough review, with the suggestions and changes provided for improving the package, and for recommending the package for the gold badge.
We are grateful for your contributions as a reviewer and agree to acknowledge you as a package reviewer in the DESCRIPTION file.
Below, we address the final minor points raised in the review for clarity and completeness.
On behalf of all the authors, I would like to thank you and the rOpenSci team again for your time and effort in reviewing this package. Please, let me know what is the next step of the review process.

I have successfully merged your pull request addressing the styling and spelling errors.
I have added assertions in the pkbc function to ensure that maxIter and numInit are single numeric values.
The standard UL2.3 has been removed from the mentioned line.
Standard UL3.2, (about the option of providing the true labels as a separate input, if available) has been removed from the function stats_clusters(), since the computed statistics do not depend on true labels. The standard tag has been added to the plotfunction which offers this option.
In test-kb.test.R, for the current tests, we verify that the output object has the expected class and we verify the correctness of the tests by checking that the tests do not reject or reject the null hypothesis as expected. Hence, the correctness of the tests is verified indirectly. We have specified this in the comment of the G5.4 standard.

maurolepore · 2025-02-02T18:04:59Z

Dear @giovsaraceno this is to mark the start of my EiC rotation. I'm reviewing all open issues and noting what I see.

I see this submission is a bit ahead of it's labels. The labels suggest the reviews are pending but they are both done. Also I see the author responded to both reviews here and here.

Next steps:

I'll record both reviews
I'll record the author responded and we're awaiting for reviewers to respond.
@kasselhingee please see this response and indicate if you request further changes, or approve using this template.
@emitanaka please see this response and indicate if you request further changes, or approve using this template.

maurolepore · 2025-02-02T18:05:03Z

@ropensci-review-bot submit review #632 (comment) time 10

ropensci-review-bot · 2025-02-02T18:05:14Z

Logged review for emitanaka (hours: 10)

maurolepore · 2025-02-02T18:07:52Z

@ropensci-review-bot submit review #632 (comment) time 6

ropensci-review-bot · 2025-02-02T18:07:54Z

Logged review for kasselhingee (hours: 6)

kasselhingee · 2025-02-03T04:45:24Z

Reviewer Response

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 16 hours

emitanaka · 2025-02-03T06:58:51Z

Reviewer Response

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10.5 hours

emitanaka · 2025-02-03T07:00:32Z

@ropensci-review-bot approve QuadratiK

ropensci-review-bot · 2025-02-03T07:00:35Z

Approved! Thanks @giovsaraceno for submitting and @kasselhingee, @emitanaka for your reviews! 😁

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

giovsaraceno · 2025-02-03T12:45:23Z

@ropensci-review-bot finalize transfer of giovsaraceno/QuadratiK-package

ropensci-review-bot · 2025-02-03T12:45:27Z

Can't find repository ropensci/giovsaraceno/QuadratiK-package, have you forgotten to transfer it first?

giovsaraceno · 2025-02-03T12:46:49Z

@ropensci-review-bot finalize transfer of QuadratiK-package

ropensci-review-bot · 2025-02-03T12:46:54Z

Transfer completed.
The QuadratiK-package team is now owner of the repository and the author has been invited to the team

giovsaraceno · 2025-02-03T14:29:07Z

Dear @maurolepore,

thanks for accepting the package in ropensci. I have now followed the to-dos listed in the last comment by the ropensci-review-bot. In particular,

I transfered the repository to rOpenSci, and this has been confirmed.
I deleted the code of conduct.
I have changed all the links in the README and DESCRIPTION files pointing to the ropensci repository.
I fixed the links for the budges.
I run the codemeta code.
Added the installation method via ropensci.

I tried to follow the steps for changing the automatic deployment of pkgdown. At the moment the webpage at the new address is not created. I would like to ask if I have just wait or if there is something missing.

maurolepore · 2025-02-03T19:59:53Z

@giovsaraceno, great! The credit goes to @emitanaka who might be able to help debug the website?

But while I'm here ... I see the URL has an odd -package suffix. Maybe something to explore?

Compare: https://github.com/orgs/ropensci/repositories

giovsaraceno · 2025-02-04T12:41:50Z

Thanks @maurolepore, I have changed the repository name from QuadratiK-package to QuadratiK.

ldecicco-USGS added the stats label Mar 15, 2024

giovsaraceno closed this as completed Mar 22, 2024

giovsaraceno reopened this Mar 22, 2024

jooolia self-assigned this May 29, 2024

mpadge mentioned this issue May 31, 2024

Unable to load package: Invalid ELF header ropensci/QuadratiK#5

Closed

ropensci-review-bot added 3/reviewer(s)-assigned and removed 2/seeking-reviewer(s) labels Jan 15, 2025

ropensci-review-bot added 4/review(s)-in-awaiting-changes and removed 3/reviewer(s)-assigned labels Feb 2, 2025

maurolepore added 5/awaiting-reviewer(s)-response and removed 4/review(s)-in-awaiting-changes labels Feb 2, 2025

ropensci-review-bot added 6/approved and removed 5/awaiting-reviewer(s)-response labels Feb 3, 2025

ropensci-review-bot closed this as completed Feb 3, 2025

github-actions bot mentioned this issue Feb 5, 2025

pkgcheck results - main ropensci/QuadratiK#9

Open

QuadratiK: A Collection of Methods Using Kernel-Based Quadratic Distances for Statistical Inference and Clustering #632

QuadratiK: A Collection of Methods Using Kernel-Based Quadratic Distances for Statistical Inference and Clustering #632

Comments

giovsaraceno commented Mar 13, 2024 • edited by ropensci-review-bot Loading

Scope

ldecicco-USGS commented Mar 15, 2024

maelle commented Mar 18, 2024

ropensci-review-bot commented Mar 18, 2024

'srr' standards compliance:

ldecicco-USGS commented Mar 19, 2024

mpadge commented Mar 20, 2024

giovsaraceno commented Mar 22, 2024

giovsaraceno commented Mar 22, 2024

giovsaraceno commented Mar 26, 2024

mpadge commented Mar 27, 2024

giovsaraceno commented Mar 28, 2024

noamross commented Apr 10, 2024

giovsaraceno commented May 1, 2024

ldecicco-USGS commented May 1, 2024

ropensci-review-bot commented May 1, 2024

ropensci-review-bot commented May 1, 2024

ropensci-review-bot commented May 1, 2024

Checks for QuadratiK (v1.0.0)

1. rOpenSci Statistical Standards (srr package)

2. Package Dependencies

3. Statistical Properties

3a. Network visualisation

4. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

5. Other Checks

Editor-in-Chief Instructions:

giovsaraceno commented May 13, 2024 • edited Loading

jooolia commented May 29, 2024

ropensci-review-bot commented May 29, 2024

ropensci-review-bot commented May 29, 2024

giovsaraceno commented May 29, 2024

mpadge commented May 31, 2024

giovsaraceno commented Jun 4, 2024

ropensci-review-bot commented Jun 4, 2024

ropensci-review-bot commented Jun 4, 2024

ropensci-review-bot commented Jan 15, 2025

ropensci-review-bot commented Jan 15, 2025

emitanaka commented Jan 29, 2025

Package Review

Documentation

Functionality

Review Comments

giovsaraceno commented Jan 31, 2025

maurolepore commented Feb 2, 2025 • edited by emitanaka Loading

maurolepore commented Feb 2, 2025

ropensci-review-bot commented Feb 2, 2025

maurolepore commented Feb 2, 2025

ropensci-review-bot commented Feb 2, 2025

kasselhingee commented Feb 3, 2025

Reviewer Response

Final approval (post-review)

emitanaka commented Feb 3, 2025

Reviewer Response

Final approval (post-review)

emitanaka commented Feb 3, 2025

ropensci-review-bot commented Feb 3, 2025

giovsaraceno commented Feb 3, 2025

ropensci-review-bot commented Feb 3, 2025

giovsaraceno commented Feb 3, 2025

ropensci-review-bot commented Feb 3, 2025

giovsaraceno commented Feb 3, 2025

maurolepore commented Feb 3, 2025

giovsaraceno commented Feb 4, 2025

giovsaraceno commented Mar 13, 2024 •

edited by ropensci-review-bot

Loading

1. rOpenSci Statistical Standards (`srr` package)

4. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck

giovsaraceno commented May 13, 2024 •

edited

Loading

maurolepore commented Feb 2, 2025 •

edited by emitanaka

Loading