-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/nwchem #58
base: develop
Are you sure you want to change the base?
Feature/nwchem #58
Conversation
1. Address the flying ice cube syndrome 2. Ensure trajectory files are written 3. Properly deal with pathnames 4. Use different intervals for restart file and trajectory file updates
There still are a few questions about passing data back to DeepDriveMD. - Does it pick the DCD trajectory file up? - Does the HDF5 contact map file work?
While you can go through a trajectory time step by time step pulling out each frame you cannot use that to write a trajectory directly (in another format for example). You have to select some atoms to create an AtomGroup which gets update every time you read a time step. You have to pass the AtomGroup to the trajectory writer. If you pass a time step to the trajectory writer then the code crashes complaining that the object you passed is neither an AtomGroup nor a Universe. To make matters worse the MDAnalysis documentation is full of broken examples that pass time steps to the trajectory writer. This page https://www.mdanalysis.org/MDAnalysisTutorial/writing.html is the only place where I found the correct way of doing this. The other interesting question is how "select_atoms" works. Thankfully the selection "all" seems to work.
File locking seems to cause problems on Crusher on the compute nodes for no apparent reason.
The run_nwchem.py component seems to be working now. The aggregation step that follows still fails. The reason the aggregation fails is that in |
NWChem's XYZ file writer may write the coordinates with Fortran's "*"-notation. This is definitely not compliant with the XYZ file format. So we need to replace the occurances of this notation with straight numbers.
So instead of just trying to activate LMS we should check whether it is there and only activate it if it is installed.
Note that the atoms in 7CZ4-unfolded.pdb and 7CZ4-folded.pdb are ordered differently. I.e. the ordering within the residues is different. I am not sure whether the RMSD calculation handles this correctly.
The structure is the same as in 7cz4_fixedFH_allH_nwc_small.pdb but now the atoms in the protein residues have reordered in the NWChem convention. I think this reordering will help calculating the RSMD.
- change run.sh to deal with multiple use cases instead of just one - nwchem/config.py change the atom selection for the contact map generation - nwchem.py/run_nwchem.py we need to be able to copy additional data files to define the nwchem calculation correctly - 7cz4/config.yaml the contact map is now larger 179x179
The new atom selection makes sure we select an even number of atoms in the 7cz4 use case (and also in the bba use case), and we set the contact map to the corresponding size.
This script pulls data from the HDF5 files generated from the trajectories. The contact maps are transformed into the latent space and selected dimensions as well as the RMSD values are stored in a CSV file. This CSV file should be easy to visualize using Matplotlib.
N2P2 insists on using 3 body potentials even if there is only 1 atom of a given element in the training set. Potentially this is causing major problems in the calculations with NaN's all over the place. So I am adding some structure with more oxygen atoms to see if that fixes it.
N2P2's scaling program will crash if there are any unbound atoms in the training set.
Initial experience suggests that in the N2P2 model we are going to be significantly impacted by implementation details. I am surprised by how porous the abstractions are. Examples of issues that need investigation are: - the complexity of the descriptors (many Gaussian and trigoniometric functions) - the complexity of the activation functions (higher order polynomials) - the fact that the last activation functions are still linear functions. So no matter the complicated descriptors and the polynomial activation functions in the hidden layers in the end you are still fitting a function with straight line segments! - the convergence rate of the training - 10 epochs is not normally enough except when you update your model with every training point and have a massive training set - with small training sets (a couple of hundred points) the convergence of training seems very slow with errors on the order of 10e+08 on the training set after 100 epochs - in addition even with a small training set it takes 7 minutes per epoch no matter how many resources you throw at it
To Do: - With DeePMD LAMMPS produces a model_devi.out file comparing the results of different models to assess the model precision. With N2P2 LAMMPS uses a single model. So we need to implement this comparison ourselves. Part of the infrastructure is there now, but the actual comparison and writing of the model_devi.out file still needs to be implemented.
The test case we are studying has only single bonds so let's not confuse the neural network with data related to things it doesn't need to know about.
These calculations cannot be used with N2P2 because that code crashes if there are unbound atoms.
This pull request wants to add NWChem support to DeepDriveMD.
For this purpose a new directory
DeepDriveMD-pipeline/deepdrivemd/sim/nwchem
has been added. Initially this directory was a copy ofDeepDriveMD-pipeline/deepdrivemd/sim/openmm
.This pull requests adds the following files:
nwchem.py
- contains input generators for the various NWChem calculations needed for what DeepDriveMD tries to accomplishnwchem_test.py
- contains a Python script that executes the functionality ofnwchem.py
on the 1FME example.TO DO:
[V] create
run_nwchem.py
fromrun_openmm.py
[V] adapt
run_nwchem.py
to executing NWChem instead of OpenMM[V] adapt
config.py
to NWChem[ ] other