You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+94-6
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,11 @@
2
2
3
3
Implementation of equivariant GVP-GNNs as described in [Learning from Protein Structure with Geometric Vector Perceptrons](https://openreview.net/forum?id=1YLJDvSx6J4) by B Jing, S Eismann, P Suriana, RJL Townshend, and RO Dror.
4
4
5
-
This repository serves two purposes. If you would like to use the GVP architecture for structural biology tasks, we provide building blocks for models and data pipelines. If you are specifically interested in protein design as described in the paper, we provide scripts for training and testing models.
5
+
**UPDATE:** Also includes equivariant GNNs with vector gating as described in [Equivariant Graph Neural Networks for 3D Macromolecular Structure](https://arxiv.org/abs/2106.03843) by B Jing, S Eismann, P Soni, and RO Dror.
6
6
7
-
**Note:** This repository is an implementation in PyTorch Geometric emphasizing usability and flexibility. The original code for the paper, in TensorFlow, can be found [here](https://github.com/drorlab/gvp). We thank Pratham Soni for his contributions to the implementation in PyTorch.
7
+
Scripts for training/testing/sampling on protein design and training/testing on all [ATOM3D](https://arxiv.org/abs/2012.04035) tasks are provided.
8
+
9
+
**Note:** This implementation is in PyTorch Geometric. The original TensorFlow code, which is not maintained, can be found [here](https://github.com/drorlab/gvp).
@@ -20,15 +22,17 @@ This repository serves two purposes. If you would like to use the GVP architectu
20
22
* tqdm==4.38.0
21
23
* numpy==1.19.4
22
24
* sklearn==0.24.1
25
+
* atom3d==0.2.1
23
26
24
27
While we have not tested with other versions, any reasonably recent versions of these requirements should work.
25
28
26
29
## General usage
27
30
28
31
We provide classes in three modules:
29
32
*`gvp`: core GVP modules and GVP-GNN layers
30
-
*`gvp.data`: data pipeline functionality for both general use and protein design
31
-
*`gvp.models`: implementations of MQA and CPD models as described in the paper
33
+
*`gvp.data`: data pipelines for both general use and protein design
34
+
*`gvp.models`: implementations of MQA and CPD models
35
+
*`gvp.atom3d`: models and data pipelines for ATOM3D
32
36
33
37
The core modules in `gvp` are meant to be as general as possible, but you will likely have to modify `gvp.data` and `gvp.models` for your specific application, with the existing classes serving as examples.
To use vector gating, pass in `vector_gate=True` and the appropriate activations.
60
+
```
61
+
gvp_ = gvp.GVP(in_dims, out_dims,
62
+
activations=(F.relu, None), vector_gate=True)
63
+
```
55
64
The classes `gvp.Dropout` and `gvp.LayerNorm` implement vector-channel dropout and layer norm, while using normal dropout and layer norm for scalar channels. Both expect inputs and return outputs of form `(s, V)`, but will also behave like their scalar-valued counterparts if passed a single tensor.
The class GVPConvLayer is a `nn.Module` that forms messages using a `GVPConv` and updates the node embeddings as described in the paper. Because the updates are residual, the dimensionality of the embeddings are not changed.
98
+
The class `GVPConvLayer` is a `nn.Module` that forms messages using a `GVPConv` and updates the node embeddings as described in the paper. Because the updates are residual, the dimensionality of the embeddings are not changed.
Both `GVPConv` and `GVPConvLayer` accept arguments `activations` and `vector_gate` to use vector gating.
110
+
100
111
### Loading data
101
112
102
113
The class `gvp.data.ProteinGraphDataset` transforms protein backbone structures into featurized graphs. Following [Ingraham, et al, NeurIPS 2019](https://github.com/jingraham/neurips19-graph-protein-design), we use a JSON/dictionary format to specify backbone structures:
The output will be an int tensor, with mappings corresponding to those used when training the model.
206
217
218
+
## ATOM3D
219
+
We provide models and dataloaders for all ATOM3D tasks in `gvp.atom3d`, as well as a training and testing script in `run_atom3d.py`. This also supports loading pretrained weights for transfer learning experiments.
220
+
221
+
### Models / data loaders
222
+
The GVP-GNNs for ATOM3D are supplied in `gvp.atom3d` and are named after each task: `gvp.atom3d.MSPModel`, `gvp.atom3d.PPIModel`, etc. All of these extend the base class `gvp.atom3d.BaseModel`. These classes take no arguments at initialization, take in a `torch_geometric.data.Batch` representation of a batch of structures, and return an output corresponding to the task. Details vary based on the exact task---see the docstrings.
223
+
```
224
+
psr_model = gvp.atom3d.PSRModel()
225
+
```
226
+
`gvp.atom3d` also includes data loaders to produce `torch_geometric.data.Batch` objects from an underlying `atom3d.datasets.LMDBDataset`. In the case of all tasks except PPI and RES, these are in the form of callable transform objects---`gvp.atom3d.SMPTransform`, `gvp.atom3d.RSRTransform`, etc---which should be passed into the constructor of a `atom3d.datasets.LMDBDataset`:
The dataloaders can be directly iterated over to yield `torch_geometric.data.Batch` objects, which can then be passed into the models.
241
+
```
242
+
for batch in psr_dataloader:
243
+
pred = psr_model(batch) # pred.shape = (batch_size,)
244
+
```
245
+
246
+
### Training / testing
247
+
248
+
To run training / testing on ATOM3D, download the datasets as described [here](https://www.atom3d.ai/). Modify the function `get_datasets` in `run_atom3d.py` with the paths to the datasets. Then run:
--num-workers N number of threads for loading data, default=4
264
+
--smp-idx IDX label index for SMP, in range 0-19
265
+
--lba-split SPLIT identity cutoff for LBA, 30 (default) or 60
266
+
--batch SIZE batch size, default=8
267
+
--train-time MINUTES maximum time between evaluations on valset,
268
+
default=120 minutes
269
+
--val-time MINUTES maximum time per evaluation on valset, default=20
270
+
minutes
271
+
--epochs N training epochs, default=50
272
+
--test PATH evaluate a trained model
273
+
--lr RATE learning rate
274
+
--load PATH initialize first 2 GNN layers with pretrained weights
275
+
```
276
+
For example:
277
+
```
278
+
# train a model
279
+
python run_atom3d.py PSR
280
+
281
+
# train a model with pretrained weights
282
+
python run_atom3d.py PSR --load PATH
283
+
284
+
# evaluate a model
285
+
python run_atom3d.py PSR --test PATH
286
+
```
287
+
207
288
## Acknowledgements
208
289
Portions of the input data pipeline were adapted from [Ingraham, et al, NeurIPS 2019](https://github.com/jingraham/neurips19-graph-protein-design). We thank Pratham Soni for portions of the implementation in PyTorch.
209
290
@@ -217,4 +298,11 @@ Portions of the input data pipeline were adapted from [Ingraham, et al, NeurIPS
217
298
year={2021},
218
299
url={https://openreview.net/forum?id=1YLJDvSx6J4}
219
300
}
220
-
```
301
+
302
+
@article{jing2021equivariant,
303
+
title={Equivariant Graph Neural Networks for 3D Macromolecular Structure},
304
+
author={Jing, Bowen and Eismann, Stephan and Soni, Pratham N and Dror, Ron O},
0 commit comments