WIP: chempy refactoring #433

ye11owSub · 2025-02-15T20:31:26Z

adding tests for chempy.cpv - Compares the results from cpv.py and Numpy and shows their interchangeability

modules/chempy/cpv.py

JarrettSJohnson · 2025-02-16T23:18:41Z

Won't comment on details at the moment, but currently I do have a couple of high-level comments:

I'm usually all in favor of refactoring code, but doing so should come with some sort of purpose so that there's an overall net positive. Usually there is a cost of doing so ( it is not always free: a708606 98c85b8 Shortcut now missing has_key method #425 ) .
Similar to above I don't think the cpv module posed any sort of developer obstacle that would warrant a change. In fact, instead of investing time into making cpv nicer to use, I think it would also be worth to consider if cpv is even needed in the first place (No code to maintain is better than maintaining the nicest code). Most of this module was written over two decades ago and there now exists libraries (especially numpy) that have much higher usage from a higher number of experts that have thought about linear algebra more than we have. IMO, one step forward would be to consider if we can replace cpv altogether (perhaps keeping the functions not present in numpy/numpy.linalg). This would of course mean that scripts that use cpv would need to be changed to use numpy (which IMO should rely on numpy and not PyMOL to do linear algebra).
I'd rather keep the verbose setting on for CI so that I can see each step of the C++ compilation process.

ye11owSub · 2025-02-17T00:08:19Z

hey @JarrettSJohnson !

I'm usually all in favor of refactoring code, but doing so should come with some sort of purpose so that there's an overall net positive. Usually there is a cost of doing so ( it is not always free: a708606 98c85b8 #425 ) .

This is the cost of poor code quality and lack of test coverage. Refactoring is a method to find and fix these issues.

Similar to above I don't think the cpv module posed any sort of developer obstacle that would warrant a change. In fact, instead of investing time into making cpv nicer to use, I think it would also be worth to consider if cpv is even needed in the first place (No code to maintain is better than maintaining the nicest code). Most of this module was written over two decades ago and there now exists libraries (especially numpy) that have much higher usage from a higher number of experts that have thought about linear algebra more than we have. IMO, one step forward would be to consider if we can replace cpv altogether (perhaps keeping the functions not present in numpy/numpy.linalg). This would of course mean that scripts that use cpv would need to be changed to use numpy (which IMO should rely on numpy and not PyMOL to do linear algebra).

I'm not sure I understand what you mean when you say "there now exists libraries (especially numpy) that have much higher usage". The entire cpv.py module was completely rewritten using numpy in this pull request. This new implementation is compatible with scripts in the pymol-scripts repo and other pymol modules.
In any case, I completely agree with your idea of replacing cpv.py with numpy. However, I assumed that changing the established API to use numpy would be unacceptable. If you believe that replacing all calls of cpv.py in the pymol and pymol-scripts repositories with numpy is a good idea, then I would be happy to do it.

I'd rather keep the verbose setting on for CI so that I can see each step of the C++ compilation process.

done

JarrettSJohnson · 2025-02-17T00:57:53Z

This is the cost of poor code quality and lack of test coverage. Refactoring is a method to find and fix these issues.

Even if the original code quality was poor and without sufficient test coverage, these specific issues were manifested from the refactoring process due to a couple of properties missed from the original code (which were also easily identifiable by using common functionality in PyMOL--and that's on me too for not testing the PR before merging it). I think we should emphasize code coverage a little bit more than class-level refactoring.

If you believe that replacing all calls of cpv.py in the pymol and pymol-scripts repositories with numpy is a good idea, then I would be happy to do it.

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers, but I'm generally in favor of removing the basic linear algebra functions from cpv.

ye11owSub · 2025-02-17T11:24:28Z

Even if the original code quality was poor and without sufficient test coverage, these specific issues were manifested from the refactoring process due to a couple of properties missed from the original code (which were also easily identifiable by using common functionality in PyMOL--and that's on me too for not testing the PR before merging it).

I'm truly sorry that my previous PR caused issues that you had to fix. However, in my opinion, this is common part of the software development process.
Due to my lack of experience using pymol, it is difficult for me to test any scenarios since. I have never had to work with pymol as a user, so I rely heavily on tests and grep. Actually these small PRs help me understand the project and contribute something useful in the process

I think we should emphasize code coverage a little bit more than class-level refactoring.

Adding commas to docstrings is worthless stuff, but I am trying to make the code more readable. Therefore, I don't see a problem with refactoring at the class level.
You are right that some of the scripts in the project are more than 20 years old and their readability is poor. Taking small steps to improve them is better than doing nothing at all.

speleo3 · 2025-02-17T14:32:35Z

I support Jarrett's assessment here.

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers

Fully agree.

I like the added tests and type hints from the first commit, but the numpyfy refactoring is too much IMHO.

In my own scripts I always used either only numpy -- taking advantage of all its features and keeping data in numpy arrays -- or chempy.cpv for its simplicity and no numpy dependency. Making chempy.cpv a numpy wrapper feels like combining the disadvantages from both worlds.

ye11owSub · 2025-02-17T16:13:24Z

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers

No one argued with that

@speleo3 as you wish, there are now only type annotations and tests

TstewDev · 2025-02-18T19:07:29Z

Hello @ye11owSub,

I'm Thomas Stewart, a PyMOL developer and the current Product Manager for PyMOL at Schrödinger. I just wanted to add my thoughts to a few of your comments:

I'm truly sorry that my previous PR caused issues that you had to fix. However, in my opinion, this is common part of the software development process. Due to my lack of experience using pymol, it is difficult for me to test any scenarios since. I have never had to work with pymol as a user, so I rely heavily on tests and grep. Actually these small PRs help me understand the project and contribute something useful in the process

There's no need to apologize but I do hope it helps explain our general reluctance. I agree that fixing issues introduced by PR's is definitely part of the software development process and I don't want reject PR's simply on that basis. However, I would point out that finding and fixing these issues does take developer time and resources that could be spent on more productive tasks. This means that these PR's do come with a cost (reviewing, testing, maintaining, etc.), regardless of how simple they may appear. They need to be impactful enough to justify merging them into the codebase.

You also mention your lack of experience using PyMOL as a user. Not to say that only heavy PyMOL users can contribute to the project, but I'm curious what your motivations are if you're not trying to address an issue with how the app currently functions. I certainly understand the benefits of clean and well-documented code, but I don't view this as being a beneficial use of your time and effort if these files should be replaced completely.

If you (or anyone else reading this) are really interested in making a significant contribution to the project, I would encourage you to play around with PyMOL and try to identify some functionality/features that would benefit from your effort.

Taking small steps to improve them is better than doing nothing at all.

I certainly understand what you're saying here, but I think it's an oversimplification for the reasons I stated above. In addition to the review/maintenance costs, small changes impact git blame, git history, and consistency across files. Refactoring to make the code more readable can be a noble goal when done with a clear objective, however it can also come with a real cost when just making changes for the sake of making changes.

All that being said, I do believe there is real value being added in this PR and the tests for PyMOL certainly should be improved. I just want to explain my thought process when evaluating PR's in general if you plan on submitting more in the future.

ye11owSub · 2025-02-18T21:20:12Z

Hey @TstewDev!
This PR has attracted more attention than it deserves.

There's no need to apologize but I do hope it helps explain our general reluctance. I agree that fixing issues introduced by PR's is definitely part of the software development process and I don't want reject PR's simply on that basis. However, I would point out that finding and fixing these issues does take developer time and resources that could be spent on more productive tasks. This means that these PR's do come with a cost (reviewing, testing, maintaining, etc.), regardless of how simple they may appear. They need to be impactful enough to justify merging them into the codebase.

I understand, each PR has a cost (so let's reduce this cost through testing).

I'm curious what your motivations are if you're not trying to address an issue with how the app currently functions. I certainly understand the benefits of clean and well-documented code, but I don't view this as being a beneficial use of your time and effort if these files should be replaced completely.

The shortest and at the same time the most complete answer is because I can. It's sad to see that, in 6 years, the project has had 80 PRs closed from the open-source community. Pymol is a popular tool for a specific group of people, I'm not one of them, but i have a CS degree and some free time
I didn't find any specific plans for the future development of the project, so I decided to focus on something that was clearly in need of an update.
You say that these files will be completely replaced, but this is only true for the end of the process. There are a lot of things that need to be done before and testing of old code is one of these things. I think it will take a significant amount of time to replace the cpv.py, and even then, it will be replaced with code from these tests.

Refactoring to make the code more readable can be a noble goal when done with a clear objective, however it can also come with a real cost when just making changes for the sake of making changes.

In general I agree, but in this case, I don't think that's the case. If you have a different opinion, that's fine. Let's fix/add/delete what you think is necessary or close this PR and move on. That's OK for me.

I am also currently refactoring the chempy {models.py, __init__.py, io.py}. I wanted to split this into separate PRs, but if you prefer to have more changes per PR, we can set this one on pause.

ye11owSub · 2025-02-18T21:21:25Z

modules/chempy/cpv.py

+               m1[1][0]*m2[0][2] + m1[1][1]*m2[1][2] + m1[1][2]*m2[2][2]],
+             [m1[2][0]*m2[0][0] + m1[2][1]*m2[1][0] + m1[2][2]*m2[2][0],
+               m1[2][0]*m2[0][1] + m1[2][1]*m2[1][1] + m1[2][2]*m2[2][1],
+               m1[2][0]*m2[0][2] + m1[2][1]*m2[1][2] + m1[2][2]*m2[2][2]]]


Matrix multiplication was not implemented correctly. This and code duplication has also been fixed in this PR

TstewDev · 2025-02-21T21:02:00Z

Hello @ye11owSub!

The shortest and at the same time the most complete answer is because I can. It's sad to see that, in 6 years, the project has had 80 PRs closed from the open-source community.

Say no more, welcome the project! The effort you have already put into these PR's is really appreciated and it sounds like you really are serious about making a contribution.

Please forgive my original tone of skepticism, I just know that open-source projects like this can fall victim to developers creating PR's when they have little intention of actually seeing these changes through. It definitely doesn't sound like that's the case here and we welcome all the help we can get.

I understand, each PR has a cost (so let's reduce this cost through testing).

I'm a big fan of adding tests like this and I think it's one of the obvious areas for improvement.

In general I agree, but in this case, I don't think that's the case. If you have a different opinion, that's fine. Let's fix/add/delete what you think is necessary or close this PR and move on. That's OK for me.

I don't actually think I have any issue with this review now that it has this refined scope. I will take another closer look and add any additional comments if necessary.

I am also currently refactoring the chempy {models.py, __init__.py, io.py}. I wanted to split this into separate PRs, but if you prefer to have more changes per PR, we can set this one on pause.

Happy to hear it! I'm normally in favor of splitting these into multiple smaller review but it sounds like these might be quite intertwined? I'll leave it up your judgement but if you feel like there's relevant context that these other changes would provide, feel free to combine them.

ye11owSub · 2025-02-28T22:01:36Z

Hi @TstewDev !
Happy to hear that. Thank you!
For this PR, it is important to demonstrate that the new tests pass before and after the changes.
Therefore, I was focused on fixing the issues in the CI pipeline. I hope someone could also review this PR

ye11owSub force-pushed the chempy_refactoring branch 2 times, most recently from 491cd50 to 9233afc Compare February 15, 2025 21:27

speleo3 reviewed Feb 16, 2025

View reviewed changes

modules/chempy/cpv.py Outdated Show resolved Hide resolved

modules/chempy/cpv.py Show resolved Hide resolved

ye11owSub force-pushed the chempy_refactoring branch 4 times, most recently from d8efebe to 5614259 Compare February 16, 2025 12:53

ye11owSub changed the title ~~WIP: chempy.cpv refactoring~~ WIP: Switching from Python lists to NumPy arrays for linear algebra operations Feb 16, 2025

ye11owSub force-pushed the chempy_refactoring branch from 5b1e43c to fa549d2 Compare February 16, 2025 21:50

ye11owSub changed the title ~~WIP: Switching from Python lists to NumPy arrays for linear algebra operations~~ Switching the computation in cpv.py from python lists to numpy arrays for linear algebra operations Feb 16, 2025

ye11owSub requested a review from speleo3 February 16, 2025 22:05

ye11owSub force-pushed the chempy_refactoring branch from fa549d2 to 137f140 Compare February 16, 2025 22:18

ye11owSub changed the title ~~Switching the computation in cpv.py from python lists to numpy arrays for linear algebra operations~~ Switching the computation in cpv.py from python lists to numpy arrays Feb 16, 2025

ye11owSub force-pushed the chempy_refactoring branch from 137f140 to be6eae8 Compare February 17, 2025 00:08

ye11owSub force-pushed the chempy_refactoring branch from be6eae8 to 3431f50 Compare February 17, 2025 16:11

ye11owSub force-pushed the chempy_refactoring branch from 3431f50 to 308bae1 Compare February 17, 2025 16:20

ye11owSub added 2 commits February 17, 2025 16:24

adding tests for chempy.cpv

8d5232e

fixing ci

1373979

ye11owSub force-pushed the chempy_refactoring branch from 308bae1 to 1373979 Compare February 17, 2025 16:24

ye11owSub changed the title ~~Switching the computation in cpv.py from python lists to numpy arrays~~ tests for cpv.py Feb 17, 2025

ye11owSub commented Feb 18, 2025

View reviewed changes

removing unused methods for Base, Connected, Indexed models

bb36778

adding tests for chempy models

d45d258

ye11owSub changed the title ~~tests for cpv.py~~ WIP: chempy refactoring Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: chempy refactoring #433

WIP: chempy refactoring #433

ye11owSub commented Feb 15, 2025 •

edited

Loading

JarrettSJohnson commented Feb 16, 2025

ye11owSub commented Feb 17, 2025

JarrettSJohnson commented Feb 17, 2025

ye11owSub commented Feb 17, 2025 •

edited

Loading

speleo3 commented Feb 17, 2025

ye11owSub commented Feb 17, 2025 •

edited

Loading

TstewDev commented Feb 18, 2025

ye11owSub commented Feb 18, 2025

ye11owSub Feb 18, 2025

TstewDev commented Feb 21, 2025

ye11owSub commented Feb 28, 2025

WIP: chempy refactoring #433

Are you sure you want to change the base?

WIP: chempy refactoring #433

Conversation

ye11owSub commented Feb 15, 2025 • edited Loading

JarrettSJohnson commented Feb 16, 2025

ye11owSub commented Feb 17, 2025

JarrettSJohnson commented Feb 17, 2025

ye11owSub commented Feb 17, 2025 • edited Loading

speleo3 commented Feb 17, 2025

ye11owSub commented Feb 17, 2025 • edited Loading

TstewDev commented Feb 18, 2025

ye11owSub commented Feb 18, 2025

ye11owSub Feb 18, 2025

Choose a reason for hiding this comment

TstewDev commented Feb 21, 2025

ye11owSub commented Feb 28, 2025

ye11owSub commented Feb 15, 2025 •

edited

Loading

ye11owSub commented Feb 17, 2025 •

edited

Loading

ye11owSub commented Feb 17, 2025 •

edited

Loading