Feature/nwchem #58

hjjvandam · 2023-05-27T21:41:48Z

This pull request wants to add NWChem support to DeepDriveMD.

For this purpose a new directory DeepDriveMD-pipeline/deepdrivemd/sim/nwchem has been added. Initially this directory was a copy of DeepDriveMD-pipeline/deepdrivemd/sim/openmm.

This pull requests adds the following files:

nwchem.py - contains input generators for the various NWChem calculations needed for what DeepDriveMD tries to accomplish
nwchem_test.py - contains a Python script that executes the functionality of nwchem.py on the 1FME example.

TO DO:

[V] create run_nwchem.py from run_openmm.py
[V] adapt run_nwchem.py to executing NWChem instead of OpenMM
[V] adapt config.py to NWChem
[ ] other

Develop

… into develop

1. Address the flying ice cube syndrome 2. Ensure trajectory files are written 3. Properly deal with pathnames 4. Use different intervals for restart file and trajectory file updates

There still are a few questions about passing data back to DeepDriveMD. - Does it pick the DCD trajectory file up? - Does the HDF5 contact map file work?

While you can go through a trajectory time step by time step pulling out each frame you cannot use that to write a trajectory directly (in another format for example). You have to select some atoms to create an AtomGroup which gets update every time you read a time step. You have to pass the AtomGroup to the trajectory writer. If you pass a time step to the trajectory writer then the code crashes complaining that the object you passed is neither an AtomGroup nor a Universe. To make matters worse the MDAnalysis documentation is full of broken examples that pass time steps to the trajectory writer. This page https://www.mdanalysis.org/MDAnalysisTutorial/writing.html is the only place where I found the correct way of doing this. The other interesting question is how "select_atoms" works. Thankfully the selection "all" seems to work.

File locking seems to cause problems on Crusher on the compute nodes for no apparent reason.

hjjvandam · 2023-07-23T03:33:43Z

The run_nwchem.py component seems to be working now. The aggregation step that follows still fails. The reason the aggregation fails is that in aggregate.py the python code tries to create an HDF5 file at output_path. output_path is the name of an existing directory and therefore not a valid file name. As the file names will have to be consistent between different workflow components there has to be a convention for how these files are named. Any tips?

… example.

NWChem's XYZ file writer may write the coordinates with Fortran's "*"-notation. This is definitely not compliant with the XYZ file format. So we need to replace the occurances of this notation with straight numbers.

So instead of just trying to activate LMS we should check whether it is there and only activate it if it is installed.

Note that the atoms in 7CZ4-unfolded.pdb and 7CZ4-folded.pdb are ordered differently. I.e. the ordering within the residues is different. I am not sure whether the RMSD calculation handles this correctly.

The structure is the same as in 7cz4_fixedFH_allH_nwc_small.pdb but now the atoms in the protein residues have reordered in the NWChem convention. I think this reordering will help calculating the RSMD.

- change run.sh to deal with multiple use cases instead of just one - nwchem/config.py change the atom selection for the contact map generation - nwchem.py/run_nwchem.py we need to be able to copy additional data files to define the nwchem calculation correctly - 7cz4/config.yaml the contact map is now larger 179x179

The new atom selection makes sure we select an even number of atoms in the 7cz4 use case (and also in the bba use case), and we set the contact map to the corresponding size.

This script pulls data from the HDF5 files generated from the trajectories. The contact maps are transformed into the latent space and selected dimensions as well as the RMSD values are stored in a CSV file. This CSV file should be easy to visualize using Matplotlib.

N2P2 insists on using 3 body potentials even if there is only 1 atom of a given element in the training set. Potentially this is causing major problems in the calculations with NaN's all over the place. So I am adding some structure with more oxygen atoms to see if that fixes it.

N2P2's scaling program will crash if there are any unbound atoms in the training set.

Initial experience suggests that in the N2P2 model we are going to be significantly impacted by implementation details. I am surprised by how porous the abstractions are. Examples of issues that need investigation are: - the complexity of the descriptors (many Gaussian and trigoniometric functions) - the complexity of the activation functions (higher order polynomials) - the fact that the last activation functions are still linear functions. So no matter the complicated descriptors and the polynomial activation functions in the hidden layers in the end you are still fitting a function with straight line segments! - the convergence rate of the training - 10 epochs is not normally enough except when you update your model with every training point and have a massive training set - with small training sets (a couple of hundred points) the convergence of training seems very slow with errors on the order of 10e+08 on the training set after 100 epochs - in addition even with a small training set it takes 7 minutes per epoch no matter how many resources you throw at it

To Do: - With DeePMD LAMMPS produces a model_devi.out file comparing the results of different models to assess the model precision. With N2P2 LAMMPS uses a single model. So we need to implement this comparison ourselves. Part of the infrastructure is there now, but the actual comparison and writing of the model_devi.out file still needs to be implemented.

The test case we are studying has only single bonds so let's not confuse the neural network with data related to things it doesn't need to know about.

These calculations cannot be used with N2P2 because that code crashes if there are unbound atoms.

braceal and others added 14 commits July 6, 2021 11:37

Merge pull request DeepDriveMD#36 from DeepDriveMD/develop

69fcd72

Develop

Merge pull request DeepDriveMD#47 from DeepDriveMD/develop

a44d394

Develop

Merge branch 'develop' of github.com:DeepDriveMD/DeepDriveMD-pipeline…

1729bbd

… into develop

Adding initial NWChem code.

a0bb658

Various fixes.

c5564cb

1. Address the flying ice cube syndrome 2. Ensure trajectory files are written 3. Properly deal with pathnames 4. Use different intervals for restart file and trajectory file updates

Getting ready to test

bff689f

Mostly syntax fixes.

6da94f4

DeepDriveMD has its own labels for machines.

78555ca

Now NWChem runs

3624c33

There still are a few questions about passing data back to DeepDriveMD. - Does it pick the DCD trajectory file up? - Does the HDF5 contact map file work?

Fixed DCD trajectory file name.

5fdb0cf

Adding reference PDB file.

e3ed5ad

Tedious stuff

1ccf00d

Turn HDF5 file locking off.

e0ddc7d

File locking seems to cause problems on Crusher on the compute nodes for no apparent reason.

hjjvandam added 15 commits July 26, 2023 10:49

Adding reference PDB file needed for the Aggregate stage

c3a0989

Fixes for various problems with the NWChem implementation of the 1FME…

d4134ed

… example.

Found another bug in the NWChem XYZ file writer.

0cca51c

NWChem's XYZ file writer may write the coordinates with Fortran's "*"-notation. This is definitely not compliant with the XYZ file format. So we need to replace the occurances of this notation with straight numbers.

Large Model Extensions are only available with Anaconda Python

1713d5d

So instead of just trying to activate LMS we should check whether it is there and only activate it if it is installed.

Adding parallelism

2cb23a5

Adding ligand binding test case

73d31c8

Note that the atoms in 7CZ4-unfolded.pdb and 7CZ4-folded.pdb are ordered differently. I.e. the ordering within the residues is different. I am not sure whether the RMSD calculation handles this correctly.

Same structure as 7cz4_fixedFH_allH_nwc_small.pdb but atoms reordered

5da2556

The structure is the same as in 7cz4_fixedFH_allH_nwc_small.pdb but now the atoms in the protein residues have reordered in the NWChem convention. I think this reordering will help calculating the RSMD.

Force parameters that are needed but unknown to NWChem

fbb82b1

Segment file needed for NWChem to interpret the APR residue correctly.

0b561be

Fix atoms selection

e2b749f

The new atom selection makes sure we select an even number of atoms in the 7cz4 use case (and also in the bba use case), and we set the contact map to the corresponding size.

Comments on the analysis tools.

31d03c7

Fixes to help text.

ae06159

Adding explaination of output.

d8dcc3a

hjjvandam added 30 commits August 30, 2024 17:42

New inputs

576d2d2

N2P2 needs data on a c-c-c angle term

37df002

N2P2 cannot deal with individual atoms

b780714

N2P2's scaling program will crash if there are any unbound atoms in the training set.

Add N2P2 model training

6273289

Switching to N2P2

fa6286a

Adding code for dealing with data stored in "input.data" files

8459b39

Adding more structures

de6e71e

C3 with only single bonds

8c8589b

Removing geometries with CC double bonds

7b518b6

The test case we are studying has only single bonds so let's not confuse the neural network with data related to things it doesn't need to know about.

reintroducing single atom calculation for DeePMD

5d679a1

These calculations cannot be used with N2P2 because that code crashes if there are unbound atoms.

Make model switchable between DEEPMD and N2P2

86707ee

Minor fix.

e120c50

Make model switchable.

8288f5e

Minor change

2d52797

Fix syntax

a846b13

Fix syntax

e9cc050

DeePMD/N2P2 fixes

4d9d20c

Stuff for comparing results from N2P2 models

c33b5d6

Re-adding some fixes?

0e0e63d

Adding filtering of symmetry functions based on atom counts

1a68cda

Bring N2P2 implementation inline with DeePMD

50277e3

Automatically switch between DEEPMD and N2P2

72c6e8a

More fixes for DeePMD/N2P2.

98e77c8

Correct paths for generative model_devi.out

877d7b2

Adding switch between DEEPMD and N2P2.

3765e66

Adapt to DEEPMD and N2P2 capabilities

ab199a4

Minor fix

458f51b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/nwchem #58

Feature/nwchem #58

hjjvandam commented May 27, 2023 •

edited

Loading

hjjvandam commented Jul 23, 2023

Feature/nwchem #58

Are you sure you want to change the base?

Feature/nwchem #58

Conversation

hjjvandam commented May 27, 2023 • edited Loading

hjjvandam commented Jul 23, 2023

hjjvandam commented May 27, 2023 •

edited

Loading