Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect sum formula and molecular mass for inorganic compounds #2340

Open
schatzsc opened this issue Feb 18, 2025 · 19 comments
Open

Incorrect sum formula and molecular mass for inorganic compounds #2340

schatzsc opened this issue Feb 18, 2025 · 19 comments
Labels

Comments

@schatzsc
Copy link

@adambasha0 @JanCBrammer @nicolejung

As already reported about 1 1/2 years ago in #1551 there are serious problems with the display of inorganic structures and furthermore incorrect sum formulas generated in Chemotion while Ketcher gets the formulas right.

In an extensive debug session today with @JanCBrammer we were able to track down a good part of the problem, which is rooted actually not in one but at least two different issues.

Further elaborated in separate posts below

@schatzsc schatzsc added the bug label Feb 18, 2025
@schatzsc
Copy link
Author

The one with ferrocene showing too many protons was actually NOT due to hydrogens added to the metal - that was just an unfortunate and confusing incidence that there were two hydrogens too much in some cases matching the "standard valence" of iron of two, since when we changed the metal to Ti, Cr, Co, Cu which should have different standard valences, the problem remained.

Rather, this problem is due to issues with the "kekulezation" in 5-membered rings (or more general, in rings with a uneven ring size).

When the "kekuleize" function tries to assign alternating single and double bonds, it cannot fully satisfy the target condition and ends up with one carbon atom as CH2 instead of CH (or CH instead of C if one omits explicit H).

Examples with even-membered rings from 4 to 8 (H atoms omitted):

Image Image Image

Examples with "hallucinated hydrogens" for uneven-membered rings from 3 to 7 (H atoms omitted):

Image Image Image

@schatzsc
Copy link
Author

This problem can be worked around (to obtain correct sum formulas and molecular mass - minus the extremely messy drawings) by

  1. no explicit H atoms, use of single bonds between the carbon atoms of the 5-membered rings, and assignmend of a "standard valence" of four to the ring carbon atoms - pretty much like the "simple bonds" concept of TUCAN:
Image

or

  1. explicit H atoms and use of aromatic bonds between the carbon atoms of the 5-membered rings (but very important no "standard valence" set):
Image

For even-membered rings, on the other hand, it is easiest to use alternating single and double bonds with the valence on the carbons set to five:

Image

Alternatively, one can also use alternating single/double bonds with explicit H atoms:

Image

@schatzsc
Copy link
Author

The problem with the NHC carbene complex was - on the other hand - much harder to track down.

Even with the NHC carbene C atom valence set to three, there is one additional H "hallucinated":

Image

On the other hand, if you replace the NHC methyl groups by halogen and put a methyl in the 4-position of the central pyridine ring, you end up with the correct sum formula (also applies when the methyl group is in the 3- or 5-position, data not shown):

Image

Strangely, the sum formula is also correct when there is one halogen and one methyl on the NHC groups:

Image

Two halogens, on the other hand, again result in one hydrogen too much (eight instead of correct seven):

Image

@schatzsc
Copy link
Author

Since inspection of https://github.com/ComPlat/chemotion_ELN/blob/main/app/models/molecule.rb with a debugger showed the internal data structures molecule and babel_info to contain the correct information, including the unchanged molfile and correct(!!!) babel_info[:formula] as well as babel_info[:mol_wt] it appears as if

def check_sum_formular

is responsible for this unexpected behaviour, since it is the only time where self.molecular_weight and self.sum_formular are changed before the final values are returned

This is only invoke if is_partial = babel_info[:is_partial] is set in line 95 and here OpenBabel seems to mis-interpret some of the methyl groups as "partial".

A procedure to check could either be:

  1. to set is_partial to FALSE instead of reading it from babel_info[:is_partial]

or

  1. prevent check_sum_formular from being called in line 144 (also note the spelling mistake - it should be "check_sum_formula" and not "check_sum_formular")

@schatzsc
Copy link
Author

If the issue with incorrect setting of is_partial cannot be resolved one possible way would be to add a check-box to each sample/chemical in which users can select to enable is_partial or no

@schatzsc
Copy link
Author

schatzsc commented Feb 19, 2025

Further digging in the code, is_partial is set here:

is_partial = molfile_has_R(structure, version)

using this code and more stuff below:

def self.molfile_2000_has_R(molfile)

So, this problem indeed seems to be due to problems with molfile RGROUP handling within Chemotion itself @adambasha0 @JanCBrammer @nicolejung

@schatzsc
Copy link
Author

Further playing around, the issue seems to arise not even from the NHC carbene C atoms, since when I define them as "free carbene" with a "standard valence" of two, they are right.

Rather, the problem appears to be due to the nitrogen atom, which has to be assigend a charge of +1

Incorrect sum formula with one hydrogen too much even with central N atom "standard valence" set to four:

Image

Correct sum formula with +1 charge on nitrogen:

Image

@schatzsc
Copy link
Author

Can also be confirmed with an absolute "minimum model" for the NHC complex.

Incorrect, one H too much (six instead of five):

Image

Correct, only five H atoms:

Image

@schatzsc
Copy link
Author

Then the above statement is possibly partially incorrect in the sense of blaming the R group handling, but maybe rather due to removal of metal-ligand bonds or a special handling of the "super-valent" (= "standard valence" larger than three) N atom somewhere along the line?!?

Because when I replace N by higher group V element P I need to set a valence of four to avoid a "PH" unit appearing (because "standard valence" of P is five, not three?!?) but then get the correct sum formula (same with As, Sb, and Bi but data not shown) but no charge:

Image

@schatzsc
Copy link
Author

@adambasha0 @JanCBrammer @nicolejung

Executive summary: the problem is not due to the NHC carbene unit but the "super-valent" nitrogen atom which seems to be handled differently from the other group V elements.

@schatzsc
Copy link
Author

Proof: Two nitrogen atoms with attached metal lead to two "hallucinated hydrogens":

Image

@schatzsc
Copy link
Author

Fixed by addition of +1 charges to each nitrogen atom:

Image

If only one nitrogen atoms is charged, there is still one "hallucinated" hydrogen:

Image

@schatzsc
Copy link
Author

Correct NHC Pt complex now with 13H forced by +1 charge on central ring N atom:

Image

@schatzsc
Copy link
Author

So maybe the "NHC problem" can be solved with a "drawing inorganic structures" guideline document?!?

@schatzsc
Copy link
Author

What remains is the extremely distorted look of the sandwich complexes, since both Ketcher and consequently Chemotion seem to be unable to properly handle the "multi-attachment" statement of the molfile like

M V30 21 1 22 11 ENDPTS=(5 1 5 2 4 3) ATTACH=ANY

The graphics look nice but the "*" "star pseudo-atoms" used to define the multi-attachment points are not properly handled and thus only a picture but no sum formula is passed from Ketcher to Chemotion:

Image

One can manually enter the sum formula but then possibly still the sub-structure search and other stuff will not work:

Image

@schatzsc
Copy link
Author

However, the multi-attachment statement in a v3000 molfile like

M V30 21 1 22 11 ENDPTS=(5 1 5 2 4 3) ATTACH=ANY

can easily be replaced by standard single/simple bonds, as we have implemented in TUCAN:

https://github.com/TUCAN-nest/TUCAN/blob/02aa89c62cc781fa80323cff168339c57c7b2f51/tucan/io/molfile_v3000_reader.py#L195

Thus, if

https://github.com/ComPlat/chemotion_ELN/blob/e71870723bd4fbbef42375aad08daf4101db166d/app/models/molecule.rb

does quite a bit of working on the molfile, why not also do this multipoint-to-single_bonds conversion in Chemotion before updating the molecule data structure?!?

@schatzsc
Copy link
Author

And to resolve the problem with the highly distorted display in particular of sandwich complexes might be solved but more detailed examination of what the *.svg generation does:

def self.svg_reprocess(svg, molfile)

@schatzsc
Copy link
Author

Just another example of the "hallucinated hydrogens" - in this case even six too much (30 instead of 3 x 2 x 4 = 24) due to presence of six metal-coordinated ring nitrogen atoms:

Image

Correct sum formula obtained by putting +1 charges on each of the nitrogen atoms but of course the display looks silly:

Image

@schatzsc
Copy link
Author

Final proof that the "hallucinating hydrogen" bug is due to handling of non-standard valence of nitrogen of four (instead of three):

Trimethylamine incorrect sum formula +1H:

Image

Correct sum formula with +1 charge on nitrogen:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant