Disagreement due to stereochemical SMILES #45

dswigh · 2023-04-26T17:38:02Z

Given the molecule: (e)-2-butenenitrile
PubChem will resolve to: ['C/C=C/C#N']
CIR will resolve to: ['CC=CC#N']

These two are (almost) the same SMILES strings, but Pura says they don't agree because one specifies the stereochemistry, while the other doesn't.

Perhaps a 'drop stereochemical information' arg would be a solution?

marcosfelt · 2023-04-26T20:36:12Z

I think this would make sense! So just to confirm, you'd want an option in resolve_identifiers that ignores stereochemistry differences?

dswigh · 2023-04-27T13:32:57Z

Yea! I used the following in my own code:

# Canonicalise and remove stoichiometry
def clean_smiles(smiles):
    if pd.isna(smiles):
        return smiles
    else:
        mol = Chem.MolFromSmiles(smiles)
        return Chem.MolToSmiles(mol, isomericSmiles=False) # isomericSmiles=False is what strips away the stereo info

# Apply the function to all columns in the DataFrame
df = pura_solvents.applymap(clean_smiles)

I haven't investigated fully what services/conditions cause a SMILES string to either contain, not contain, or 'explicitly be ambiguous' (ie having the crossed bond) in relation to stereochemistry.

dswigh · 2023-04-27T13:34:17Z

Didn't realise the indentation would be removed by markdown... hopefully it's self-evident how the indentation should be!

marcosfelt · 2023-04-28T13:31:21Z

Following up on the discussion we had in person. The behavior in the original post is actually expected since (e)-2-butenenitrile should resolve to C/C=C/C#N (i.e., CIR was wrong). Therefore, we would want the consensus algorithm to say these two SMILES are different and therefore there is not sufficient agreement.

marcosfelt closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disagreement due to stereochemical SMILES #45

Disagreement due to stereochemical SMILES #45

dswigh commented Apr 26, 2023

marcosfelt commented Apr 26, 2023

dswigh commented Apr 27, 2023 •

edited by marcosfelt

Loading

dswigh commented Apr 27, 2023

marcosfelt commented Apr 28, 2023 •

edited

Loading

Disagreement due to stereochemical SMILES #45

Disagreement due to stereochemical SMILES #45

Comments

dswigh commented Apr 26, 2023

marcosfelt commented Apr 26, 2023

dswigh commented Apr 27, 2023 • edited by marcosfelt Loading

dswigh commented Apr 27, 2023

marcosfelt commented Apr 28, 2023 • edited Loading

dswigh commented Apr 27, 2023 •

edited by marcosfelt

Loading

marcosfelt commented Apr 28, 2023 •

edited

Loading