Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: chempy refactoring #433

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ye11owSub
Copy link
Contributor

@ye11owSub ye11owSub commented Feb 15, 2025

@ye11owSub ye11owSub force-pushed the chempy_refactoring branch 2 times, most recently from 491cd50 to 9233afc Compare February 15, 2025 21:27
@ye11owSub ye11owSub force-pushed the chempy_refactoring branch 4 times, most recently from d8efebe to 5614259 Compare February 16, 2025 12:53
@ye11owSub ye11owSub changed the title WIP: chempy.cpv refactoring WIP: Switching from Python lists to NumPy arrays for linear algebra operations Feb 16, 2025
@ye11owSub ye11owSub changed the title WIP: Switching from Python lists to NumPy arrays for linear algebra operations Switching the computation in cpv.py from python lists to numpy arrays for linear algebra operations Feb 16, 2025
@ye11owSub ye11owSub requested a review from speleo3 February 16, 2025 22:05
@ye11owSub ye11owSub changed the title Switching the computation in cpv.py from python lists to numpy arrays for linear algebra operations Switching the computation in cpv.py from python lists to numpy arrays Feb 16, 2025
@JarrettSJohnson
Copy link
Member

Won't comment on details at the moment, but currently I do have a couple of high-level comments:

  1. I'm usually all in favor of refactoring code, but doing so should come with some sort of purpose so that there's an overall net positive. Usually there is a cost of doing so ( it is not always free: a708606 98c85b8 Shortcut now missing has_key method #425 ) .
  2. Similar to above I don't think the cpv module posed any sort of developer obstacle that would warrant a change. In fact, instead of investing time into making cpv nicer to use, I think it would also be worth to consider if cpv is even needed in the first place (No code to maintain is better than maintaining the nicest code). Most of this module was written over two decades ago and there now exists libraries (especially numpy) that have much higher usage from a higher number of experts that have thought about linear algebra more than we have. IMO, one step forward would be to consider if we can replace cpv altogether (perhaps keeping the functions not present in numpy/numpy.linalg). This would of course mean that scripts that use cpv would need to be changed to use numpy (which IMO should rely on numpy and not PyMOL to do linear algebra).
  3. I'd rather keep the verbose setting on for CI so that I can see each step of the C++ compilation process.

@ye11owSub
Copy link
Contributor Author

hey @JarrettSJohnson !

I'm usually all in favor of refactoring code, but doing so should come with some sort of purpose so that there's an overall net positive. Usually there is a cost of doing so ( it is not always free: a708606 98c85b8 #425 ) .

This is the cost of poor code quality and lack of test coverage. Refactoring is a method to find and fix these issues.

Similar to above I don't think the cpv module posed any sort of developer obstacle that would warrant a change. In fact, instead of investing time into making cpv nicer to use, I think it would also be worth to consider if cpv is even needed in the first place (No code to maintain is better than maintaining the nicest code). Most of this module was written over two decades ago and there now exists libraries (especially numpy) that have much higher usage from a higher number of experts that have thought about linear algebra more than we have. IMO, one step forward would be to consider if we can replace cpv altogether (perhaps keeping the functions not present in numpy/numpy.linalg). This would of course mean that scripts that use cpv would need to be changed to use numpy (which IMO should rely on numpy and not PyMOL to do linear algebra).

I'm not sure I understand what you mean when you say "there now exists libraries (especially numpy) that have much higher usage". The entire cpv.py module was completely rewritten using numpy in this pull request. This new implementation is compatible with scripts in the pymol-scripts repo and other pymol modules.
In any case, I completely agree with your idea of replacing cpv.py with numpy. However, I assumed that changing the established API to use numpy would be unacceptable. If you believe that replacing all calls of cpv.py in the pymol and pymol-scripts repositories with numpy is a good idea, then I would be happy to do it.

I'd rather keep the verbose setting on for CI so that I can see each step of the C++ compilation process.

done

@JarrettSJohnson
Copy link
Member

This is the cost of poor code quality and lack of test coverage. Refactoring is a method to find and fix these issues.

Even if the original code quality was poor and without sufficient test coverage, these specific issues were manifested from the refactoring process due to a couple of properties missed from the original code (which were also easily identifiable by using common functionality in PyMOL--and that's on me too for not testing the PR before merging it). I think we should emphasize code coverage a little bit more than class-level refactoring.

If you believe that replacing all calls of cpv.py in the pymol and pymol-scripts repositories with numpy is a good idea, then I would be happy to do it.

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers, but I'm generally in favor of removing the basic linear algebra functions from cpv.

@ye11owSub
Copy link
Contributor Author

ye11owSub commented Feb 17, 2025

Even if the original code quality was poor and without sufficient test coverage, these specific issues were manifested from the refactoring process due to a couple of properties missed from the original code (which were also easily identifiable by using common functionality in PyMOL--and that's on me too for not testing the PR before merging it).

I'm truly sorry that my previous PR caused issues that you had to fix. However, in my opinion, this is common part of the software development process.
Due to my lack of experience using pymol, it is difficult for me to test any scenarios since. I have never had to work with pymol as a user, so I rely heavily on tests and grep. Actually these small PRs help me understand the project and contribute something useful in the process

I think we should emphasize code coverage a little bit more than class-level refactoring.

Adding commas to docstrings is worthless stuff, but I am trying to make the code more readable. Therefore, I don't see a problem with refactoring at the class level.
You are right that some of the scripts in the project are more than 20 years old and their readability is poor. Taking small steps to improve them is better than doing nothing at all.

@speleo3
Copy link
Contributor

speleo3 commented Feb 17, 2025

I support Jarrett's assessment here.

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers

Fully agree.

I like the added tests and type hints from the first commit, but the numpyfy refactoring is too much IMHO.

In my own scripts I always used either only numpy -- taking advantage of all its features and keeping data in numpy arrays -- or chempy.cpv for its simplicity and no numpy dependency. Making chempy.cpv a numpy wrapper feels like combining the disadvantages from both worlds.

@ye11owSub
Copy link
Contributor Author

ye11owSub commented Feb 17, 2025

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers

No one argued with that

@speleo3 as you wish, there are now only type annotations and tests

@ye11owSub ye11owSub changed the title Switching the computation in cpv.py from python lists to numpy arrays tests for cpv.py Feb 17, 2025
@TstewDev
Copy link
Collaborator

Hello @ye11owSub,

I'm Thomas Stewart, a PyMOL developer and the current Product Manager for PyMOL at Schrödinger. I just wanted to add my thoughts to a few of your comments:

I'm truly sorry that my previous PR caused issues that you had to fix. However, in my opinion, this is common part of the software development process. Due to my lack of experience using pymol, it is difficult for me to test any scenarios since. I have never had to work with pymol as a user, so I rely heavily on tests and grep. Actually these small PRs help me understand the project and contribute something useful in the process

There's no need to apologize but I do hope it helps explain our general reluctance. I agree that fixing issues introduced by PR's is definitely part of the software development process and I don't want reject PR's simply on that basis. However, I would point out that finding and fixing these issues does take developer time and resources that could be spent on more productive tasks. This means that these PR's do come with a cost (reviewing, testing, maintaining, etc.), regardless of how simple they may appear. They need to be impactful enough to justify merging them into the codebase.

You also mention your lack of experience using PyMOL as a user. Not to say that only heavy PyMOL users can contribute to the project, but I'm curious what your motivations are if you're not trying to address an issue with how the app currently functions. I certainly understand the benefits of clean and well-documented code, but I don't view this as being a beneficial use of your time and effort if these files should be replaced completely.

If you (or anyone else reading this) are really interested in making a significant contribution to the project, I would encourage you to play around with PyMOL and try to identify some functionality/features that would benefit from your effort.

Taking small steps to improve them is better than doing nothing at all.

I certainly understand what you're saying here, but I think it's an oversimplification for the reasons I stated above. In addition to the review/maintenance costs, small changes impact git blame, git history, and consistency across files. Refactoring to make the code more readable can be a noble goal when done with a clear objective, however it can also come with a real cost when just making changes for the sake of making changes.

All that being said, I do believe there is real value being added in this PR and the tests for PyMOL certainly should be improved. I just want to explain my thought process when evaluating PR's in general if you plan on submitting more in the future.

@ye11owSub
Copy link
Contributor Author

Hey @TstewDev!
This PR has attracted more attention than it deserves.

There's no need to apologize but I do hope it helps explain our general reluctance. I agree that fixing issues introduced by PR's is definitely part of the software development process and I don't want reject PR's simply on that basis. However, I would point out that finding and fixing these issues does take developer time and resources that could be spent on more productive tasks. This means that these PR's do come with a cost (reviewing, testing, maintaining, etc.), regardless of how simple they may appear. They need to be impactful enough to justify merging them into the codebase.

I understand, each PR has a cost (so let's reduce this cost through testing).

I'm curious what your motivations are if you're not trying to address an issue with how the app currently functions. I certainly understand the benefits of clean and well-documented code, but I don't view this as being a beneficial use of your time and effort if these files should be replaced completely.

The shortest and at the same time the most complete answer is because I can. It's sad to see that, in 6 years, the project has had 80 PRs closed from the open-source community. Pymol is a popular tool for a specific group of people, I'm not one of them, but i have a CS degree and some free time
I didn't find any specific plans for the future development of the project, so I decided to focus on something that was clearly in need of an update.
You say that these files will be completely replaced, but this is only true for the end of the process. There are a lot of things that need to be done before and testing of old code is one of these things. I think it will take a significant amount of time to replace the cpv.py, and even then, it will be replaced with code from these tests.

Refactoring to make the code more readable can be a noble goal when done with a clear objective, however it can also come with a real cost when just making changes for the sake of making changes.

In general I agree, but in this case, I don't think that's the case. If you have a different opinion, that's fine. Let's fix/add/delete what you think is necessary or close this PR and move on. That's OK for me.

I am also currently refactoring the chempy {models.py, __init__.py, io.py}. I wanted to split this into separate PRs, but if you prefer to have more changes per PR, we can set this one on pause.

m1[1][0]*m2[0][2] + m1[1][1]*m2[1][2] + m1[1][2]*m2[2][2]],
[m1[2][0]*m2[0][0] + m1[2][1]*m2[1][0] + m1[2][2]*m2[2][0],
m1[2][0]*m2[0][1] + m1[2][1]*m2[1][1] + m1[2][2]*m2[2][1],
m1[2][0]*m2[0][2] + m1[2][1]*m2[1][2] + m1[2][2]*m2[2][2]]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matrix multiplication was not implemented correctly. This and code duplication has also been fixed in this PR

@ye11owSub ye11owSub changed the title tests for cpv.py WIP: chempy refactoring Feb 20, 2025
@TstewDev
Copy link
Collaborator

Hello @ye11owSub!

The shortest and at the same time the most complete answer is because I can. It's sad to see that, in 6 years, the project has had 80 PRs closed from the open-source community.

Say no more, welcome the project! The effort you have already put into these PR's is really appreciated and it sounds like you really are serious about making a contribution.

Please forgive my original tone of skepticism, I just know that open-source projects like this can fall victim to developers creating PR's when they have little intention of actually seeing these changes through. It definitely doesn't sound like that's the case here and we welcome all the help we can get.

I understand, each PR has a cost (so let's reduce this cost through testing).

I'm a big fan of adding tests like this and I think it's one of the obvious areas for improvement.

In general I agree, but in this case, I don't think that's the case. If you have a different opinion, that's fine. Let's fix/add/delete what you think is necessary or close this PR and move on. That's OK for me.

I don't actually think I have any issue with this review now that it has this refined scope. I will take another closer look and add any additional comments if necessary.

I am also currently refactoring the chempy {models.py, __init__.py, io.py}. I wanted to split this into separate PRs, but if you prefer to have more changes per PR, we can set this one on pause.

Happy to hear it! I'm normally in favor of splitting these into multiple smaller review but it sounds like these might be quite intertwined? I'll leave it up your judgement but if you feel like there's relevant context that these other changes would provide, feel free to combine them.

@ye11owSub
Copy link
Contributor Author

Hi @TstewDev !
Happy to hear that. Thank you!
For this PR, it is important to demonstrate that the new tests pass before and after the changes.
Therefore, I was focused on fixing the issues in the CI pipeline. I hope someone could also review this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants