Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of minimum dependencies missing (plus questions about installation instructions) #127

Closed
ojeda-e opened this issue Nov 30, 2021 · 11 comments · Fixed by #132
Closed
Labels
joss-review OpenCADD-KLIFS JOSS review

Comments

@ojeda-e
Copy link
Contributor

ojeda-e commented Nov 30, 2021

I have a couple of questions/comments regarding the information provided in the installation instructions.

There are multiple options to install opencadd, which is great. Now, reading the docs I find:

  1. opencadd can be installed with all its dependencies using mamba/conda ( link ).
  2. opencadd-klifs can be installed with all its dependencies using mamba/conda ( link, steps 1-2 )
  3. opencadd-klifs can be installed with no dependencies using mamba/conda ( link, step 3 )
    There is also an option to install "from the latest development snapshot", which I will call opencadd-dev. Then there is
  4. opencadd-dev can be installed from source ( link )
    mamba env create -f https://raw.githubusercontent.com/volkamerlab/opencadd/master/devtools/conda-envs/test_env.yaml -n opencadd

Considering points 1-4, I have one major comment and some other other comments and questions.

Major comment: The documentation offers several options to install the package. It offers to install with or w/o dependencies. However, the list of minimum dependencies is missing. Please consider that, as part of the JOSS review, a list of minimum dependencies should be clearly stated in the documentation.

Other comments and questions:
A. The installation option in (4) creates an environment called opencadd using the test_env.yaml file. I think there are two points to consider here. The first one is, test_env.yaml is a file that contains the packages needed to run tests. I am assuming so because the file has the name "test" on it. But more importantly, and which is my second point, the environment has much more than only minimum dependencies. It has packages like sphinx which are needed only for docs, for example.
Shouldn't the installation instructions point to a .yaml file that has minimum requirements instead of an environment for tests?
What I suggest, if in the interests of the authors, is to have two separate files in opencadd/devtools/conda-envs/. One that is for tests, and another for installation. then one can find opencadd/devtools/conda-envs/test_env.yaml and opencadd/devtools/conda-envs/requirements.yml.

B. There is a requirements.txt, in the same folder opencadd/devtools/conda-envs/, which is the one I think should be pointed to when building an environment from source using conda/mamba. Shouldn't the extension be .yml, instead of .txt.?
Then, one can run mamba env create -f https://raw.githubusercontent.com/volkamerlab/opencadd/master/devtools/conda-envs/requirements.yaml

C. A comment in case the authors prefer to work with requirements.txt to create an envionment. I may be wrong, but I think mamba env create requirements.txt -n <name> works with .txt files instead of .yml only if the file is found locally. Then, there are two steps in the installation:
wget https://raw.githubusercontent.com/volkamerlab/opencadd/master/devtools/conda-envs/requirements.txt,
plus
mamba env create requirements.txt -n <name>.
If that's the approach the authors prefer (although not suggested), please add the installation instructions accordingly. Also, if I am wrong about mamba env create from .txt files, please let me know.

D. I see that the opencadd tests run with MDAnalysis as a dependency of opencadd. What version of MDAnalysis does the package need? Considering the different MDAnalysis releases, I would suggest specifying, in docs and in requirements.yml, the version of MDAnalysis.

E. The installation instructions in (4) ask to create an environment from the .yaml file (step 1), activate such environment (step 2), and install via pip opencadd (step 3). I am a bit confused here. I don't fully understand the logic behind installing packages with pip after creating an environment from the .yaml file. Would please elaborate on the logic behind step 3? Does it mean there are even more packages to install via pip? Shouldn't pip packages be added to the .yaml file and address the whole installation in one single step? Shouldn't be pip install <package> be a completely separate approach?

I know there are many questions and points to clarify. Please feel free to address item by item or all at once. Thanks.

@dominiquesydow dominiquesydow added the joss-review OpenCADD-KLIFS JOSS review label Dec 7, 2021
@dominiquesydow
Copy link
Contributor

Hi @ojeda-e,

Thanks a lot for the discussion on our installation instructions!

I have created a PR #132 where I am already suggesting a few changes, on which I will base my answers below.

There are multiple options to install opencadd, which is great. Now, reading the docs I find:

  1. opencadd can be installed with all its dependencies using mamba/conda ( link ).
  2. opencadd-klifs can be installed with all its dependencies using mamba/conda ( link, steps 1-2 )
  3. opencadd-klifs can be installed with no dependencies using mamba/conda ( link, step 3 )
    There is also an option to install "from the latest development snapshot", which I will call opencadd-dev. Then there is
  4. opencadd-dev can be installed from source ( link )
    mamba env create -f https://raw.githubusercontent.com/volkamerlab/opencadd/master/devtools/conda-envs/test_env.yaml -n opencadd

We offer 3 options, however, due to a typo, it looks like there are 4 options. The 3 options are:

Option 1 (as you stated in 1.)
Env opencadd: opencadd package with all its dependencies via conda/mamba

Option 2
Env opencadd-klifs: opencadd package only with the dependencies needed for OpenCADD-KLIFS via conda/mamba (minimal list of deps are listed in step 1, i.e. bravado pandas tqdm rdkit biopandas)

See updated version here: https://github.com/volkamerlab/opencadd/blob/joss-review-issue-127/docs/installing_opencadd_klifs.rst#installing-opencadd-klifs-only

Option 3 (as you stated in 4.)
Env opencadd-dev (I like -dev, will rename this in our docs): all dependencies and pip install of the latest repo snapshot.

See updated version here: https://github.com/volkamerlab/opencadd/blob/joss-review-issue-127/docs/installing.rst#install-from-the-latest-development-snapshot

Considering points 1-4, I have one major comment and some other other comments and questions.

Major comment: The documentation offers several options to install the package. It offers to install with or w/o dependencies. However, the list of minimum dependencies is missing. Please consider that, as part of the JOSS review, a list of minimum dependencies should be clearly stated in the documentation.

Please check changes to the documentation for Option 2.
Here again the link: https://github.com/volkamerlab/opencadd/blob/joss-review-issue-127/docs/installing_opencadd_klifs.rst#installing-opencadd-klifs-only

@dominiquesydow
Copy link
Contributor

Other comments and questions:
A. The installation option in (4) creates an environment called opencadd using the test_env.yaml file. I think there are two points to consider here. The first one is, test_env.yaml is a file that contains the packages needed to run tests. I am assuming so because the file has the name "test" on it. But more importantly, and which is my second point, the environment has much more than only minimum dependencies. It has packages like sphinx which are needed only for docs, for example. Shouldn't the installation instructions point to a .yaml file that has minimum requirements instead of an environment for tests? What I suggest, if in the interests of the authors, is to have two separate files in opencadd/devtools/conda-envs/. One that is for tests, and another for installation. then one can find opencadd/devtools/conda-envs/test_env.yaml and opencadd/devtools/conda-envs/requirements.yml.

I have created now

  • devtools/conda-envs/test_env.yml including packages for testing and documentation
  • devtools/conda-envs/user_env.yml excluding packages for testing and documentation

The documentation is updated accordingly as follows:
https://github.com/volkamerlab/opencadd/blob/joss-review-issue-127/docs/installing.rst#install-from-the-latest-development-snapshot

B. There is a requirements.txt, in the same folder opencadd/devtools/conda-envs/, which is the one I think should be pointed to when building an environment from source using conda/mamba. Shouldn't the extension be .yml, instead of .txt.? Then, one can run mamba env create -f https://raw.githubusercontent.com/volkamerlab/opencadd/master/devtools/conda-envs/requirements.yaml

I deleted requirements.txt, this file was probably a left-over from the repo cookie-cutter and was not used by us. Instead I am suggesting the aforementioned user_env.yml file.

C. A comment in case the authors prefer to work with requirements.txt to create an envionment. I may be wrong, but I think mamba env create requirements.txt -n <name> works with .txt files instead of .yml only if the file is found locally. Then, there are two steps in the installation: wget https://raw.githubusercontent.com/volkamerlab/opencadd/master/devtools/conda-envs/requirements.txt, plus mamba env create requirements.txt -n <name>. If that's the approach the authors prefer (although not suggested), please add the installation instructions accordingly. Also, if I am wrong about mamba env create from .txt files, please let me know.

See B.

@dominiquesydow
Copy link
Contributor

D. I see that the opencadd tests run with MDAnalysis as a dependency of opencadd. What version of MDAnalysis does the package need? Considering the different MDAnalysis releases, I would suggest specifying, in docs and in requirements.yml, the version of MDAnalysis.

We always fetch the latest conda-forge version. Is there a disadvantage to doing it like this?

@dominiquesydow
Copy link
Contributor

E. The installation instructions in (4) ask to create an environment from the .yaml file (step 1), activate such environment (step 2), and install via pip opencadd (step 3). I am a bit confused here. I don't fully understand the logic behind installing packages with pip after creating an environment from the .yaml file. Would please elaborate on the logic behind step 3? Does it mean there are even more packages to install via pip? Shouldn't pip packages be added to the .yaml file and address the whole installation in one single step? Shouldn't be pip install <package> be a completely separate approach?

Happy to clarify this. I am going through this step-by-step using the updated version from PR #132:
https://github.com/volkamerlab/opencadd/blob/joss-review-issue-127/docs/installing.rst#install-from-the-latest-development-snapshot

  1. Create an environment opencadd-dev, which contains all dependencies needed for the opencadd library (use for that either test_env.yml if you are a contributor - or use user_env.yml if you are a user)
  2. Now activate the environment (note - it does contain all dependencies but not the latest snapshot of the opencadd library, yet).
  3. Here we install the latest snapshot of our opencadd master branch (via pip install)

Please let me know if this makes sense.

@ojeda-e
Copy link
Contributor Author

ojeda-e commented Dec 8, 2021

D. I see that the opencadd tests run with MDAnalysis as a dependency of opencadd. What version of MDAnalysis does the package need? Considering the different MDAnalysis releases, I would suggest specifying, in docs and in requirements.yml, the version of MDAnalysis.

We always fetch the latest conda-forge version. Is there a disadvantage to doing it like this?

No, nothing wrong with it. My concern is more about giving a heads up to the user about the version of MDAnalysis that is required. It's useful to know in advance before installing and finding conflicts with other packages. IMO it's better to give that information on the installation page.

This issue is almost done, we just need a subsection that lists the minimum dependencies in #132

Thanks!

REF: openjournals/joss-reviews/issues/3951

@dominiquesydow
Copy link
Contributor

D. I see that the opencadd tests run with MDAnalysis as a dependency of opencadd. What version of MDAnalysis does the package need? Considering the different MDAnalysis releases, I would suggest specifying, in docs and in requirements.yml, the version of MDAnalysis.

We always fetch the latest conda-forge version. Is there a disadvantage to doing it like this?

No, nothing wrong with it. My concern is more about giving a heads up to the user about the version of MDAnalysis that is required. It's useful to know in advance before installing and finding conflicts with other packages. IMO it's better to give that information on the installation page.

This issue is almost done, we just need a subsection that lists the minimum dependencies in #132

Thanks!

REF: openjournals/joss-reviews/issues/3951

Hi @ojeda-e, my apologies, I cannot seem to figure this out.

Are these two different suggestions: adding the mdanalysis version (where?) --- and listing the minimal dependencies for OpenCADD-KLIFS, which does not need mdanalysis, or for the full OpenCADD package?

I would be super grateful if you could outline this point (or the two points) one more time for me in other words. Thank you!

@ojeda-e
Copy link
Contributor Author

ojeda-e commented Dec 8, 2021

My apologies for not being clear @dominiquesydow. Let me try a better explanation :)

As part of the JOSS review, there should be a list of minimal dependencies.
Suggestion: In the installation page, add a subsection called "Minimum dependencies" (or similar) that contains a list of the minimum packages required to make openCADD work.

Usually, the list of minimal dependencies includes the version for each package. To cite two random/uncorrelated but handy examples, check the minimum dependencies listed for pandas package, and for a hbonds, a package to calculate order parameters from MD.

Adding the list of minimum dependencies will fix this issue.

@dominiquesydow
Copy link
Contributor

Thank you, @ojeda-e, for clarifying this :) Thanks also for the examples!

For most of the packages, I could not say for sure what the absolute minimum is for each dependency - how would I find that out?

What I can do immediately is
(a) list for each dependency, in which opencadd submodule it is used (so that users could drop depends if not needed)
(b) I could list supported versions (as tested in our CI) instead of minimum versions.

I have removed numpy (issue) and jsonschema (issue) from the environments, which we added because of previous version issues - they seem to be resolved now.

@ojeda-e
Copy link
Contributor Author

ojeda-e commented Dec 9, 2021

I would start exactly as you say: with A and then B.
And for B, as you pointed out, check the versions used in the CI, more specifically the tests that pass with the minimum version of python (since you are using python 3.6 and 3.7). This link is from your last CI run and contains the installed versions of each package.

Strictly speaking, to specify min versions, the CI would need to run an additional job using the min version of python and using an environment created with the min dependencies specified in a file min_deps.yaml or similar. A pseudo-code of it would look something like this:

     python-version: ${{ min-python-version }}
     environment-file: devtools/conda-envs/min_deps.yaml

Keep it in mind as a to-do list in the near future if you are interested, not that I am requiring it for the JOSS review. Adding the minimum dependencies with their supported versions to the documentation will do. :)

@dominiquesydow
Copy link
Contributor

Hi @ojeda-e,

Thanks a lot for all your input on this issue!

Please find the updated dependencies section here — now showing the minimal supported package versions for Python 3.7 (= our CI setup with the lowest Python version) — and let me know if you approve.

The corresponding PR: #132

@ojeda-e
Copy link
Contributor Author

ojeda-e commented Dec 16, 2021

Thanks @dominiquesydow
I think my comments are all addressed in PR #132

dominiquesydow added a commit that referenced this issue Dec 16, 2021
@dominiquesydow dominiquesydow linked a pull request Dec 16, 2021 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
joss-review OpenCADD-KLIFS JOSS review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants