-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disagreement due to stereochemical SMILES #45
Comments
I think this would make sense! So just to confirm, you'd want an option in |
Yea! I used the following in my own code: # Canonicalise and remove stoichiometry
def clean_smiles(smiles):
if pd.isna(smiles):
return smiles
else:
mol = Chem.MolFromSmiles(smiles)
return Chem.MolToSmiles(mol, isomericSmiles=False) # isomericSmiles=False is what strips away the stereo info
# Apply the function to all columns in the DataFrame
df = pura_solvents.applymap(clean_smiles) I haven't investigated fully what services/conditions cause a SMILES string to either contain, not contain, or 'explicitly be ambiguous' (ie having the crossed bond) in relation to stereochemistry. |
Didn't realise the indentation would be removed by markdown... hopefully it's self-evident how the indentation should be! |
Following up on the discussion we had in person. The behavior in the original post is actually expected since (e)-2-butenenitrile should resolve to |
Given the molecule: (e)-2-butenenitrile
PubChem will resolve to: ['C/C=C/C#N']
CIR will resolve to: ['CC=CC#N']
These two are (almost) the same SMILES strings, but Pura says they don't agree because one specifies the stereochemistry, while the other doesn't.
Perhaps a 'drop stereochemical information' arg would be a solution?
The text was updated successfully, but these errors were encountered: