You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-133
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,8 @@
11
11
12
12
**Update**: Also consider checking out our new diffusion generative model, GCDM, that uses GCPNet to improve equivariant diffusion models for 3D molecule generation in multiple ways. The GCDM [GitHub](https://github.com/BioinfoMachineLearning/bio-diffusion) and [paper](https://arxiv.org/abs/2302.04313).
13
13
14
+
**Update**: Also consider checking out the new ProteinWorkshop benchmark which features GCPNet as a state-of-the-art geometric graph neural network for representation learning of 3D protein structures. [GitHub](https://github.com/a-r-j/ProteinWorkshop).
15
+
14
16

15
17
16
18
</div>
@@ -22,10 +24,9 @@ A PyTorch implementation of Geometry-Complete SE(3)-Equivariant Perceptron Netwo
22
24
<detailsopen><summary><b>Table of contents</b></summary>
23
25
24
26
-[Creating a Virtual Environment](#virtual-environment-creation)
25
-
-[GCPNet Foundational Tasks and Models](#gcpnet-foundational)
-[GCPNet for Protein Structure EMA (GCPNet-EMA)](#gcpnet-ema)
27
+
-[GCPNet Tasks and Models](#gcpnet)
28
+
-[Model Training](#gcpnet-training)
29
+
-[Model Evaluation](#gcpnet-evaluation)
29
30
-[Acknowledgements](#acknowledgements)
30
31
-[Citations](#citations)
31
32
</details>
@@ -56,9 +57,9 @@ conda activate gcpnet # note: one still needs to use `conda` to (de)activate en
56
57
pip3 install -e .
57
58
```
58
59
59
-
## GCPNet Foundational Tasks and Models <aname="gcpnet-foundational"></a>
60
+
## GCPNet Tasks and Models <aname="gcpnet"></a>
60
61
61
-
Download data for foundational tasks
62
+
Download data for tasks
62
63
```bash
63
64
# initialize data directory structure
64
65
mkdir -p data
@@ -81,25 +82,7 @@ navigating to https://figshare.com/s/e23be65a884ce7fc8543 and downloading the th
81
82
82
83
**Note**: The ATOM3D datasets (i.e., the LBA and PSR datasets) as well as the CATH dataset we use will automatically be downloaded during execution of `src/train.py` or `src/eval.py` if they have not already been downloaded. However, data for the NMS and RS tasks must be downloaded manually.
83
84
84
-
**Another Note**: TM-score and MolProbity are required to score protein structures, where one can install them as follows:
# note: beforehand, if not already installed within the `gcpnet` environment by `mamba`, make sure `svn` is installed locally using e.g., `apt install subversion` or `yum install subversion`
conda activate gcpnet # ensure the `gcpnet` Conda environment is activated for installation
97
-
bash install_via_bootstrap.sh 4 # note: `4` here indicates the number of processes to run in parallel for faster installation
98
-
bash molprobity/setup.sh # note: this command will likely fail due to not being run inside a GUI, but nonetheless installation should now be completed
99
-
```
100
-
Make sure to update the `tmscore_exec_path` and `molprobity_exec_path` values in e.g., `configs/paths/default.yaml` to reflect where you have placed the TM-score and MolProbity executables on your machine. Also, make sure that `lddt_exec_path` points to the `bin/lddt` path within your `gcpnet` Conda environment, where `lddt` is installed automatically as described in `environment.yaml`.
101
-
102
-
## How to train foundational models <aname="gcpnet-foundational-training"></a>
85
+
## How to train models <aname="gcpnet-training"></a>
103
86
104
87
Train model with default configuration
105
88
@@ -159,7 +142,7 @@ _**New**_: For tasks that may benefit from it, you can now enable E(3) equivaria
**Note**: Make sure the `gcpnet` Mamba environment has previously been created as outlined above in the section [Creating a Virtual Environment](#virtual-environment-creation).
For example, one can predict per-residue and per-model lDDT scores for a batch of tertiary protein structure inputs, `6W6VE.pdb` and `6W77K.pdb` within `data/EQ/examples/decoy_model`, as follows
**Note**: After running the above command, an output CSV containing metadata for the predictions will be located at `logs/predict/runs/YYYY-MM-DD_HH-MM-SS/predict_YYYYMMDD_HHMMSS_rank_0_predictions.csv`, with text substitutions for the time at which the above command was completed. This CSV will contain a column called `predicted_annotated_pdb_filepath` that identifies the temporary location of each input PDB file after annotating it with GCPNet-EMA's predicted lDDT scores for each residue. If a directory containing ground-truth PDB files corresponding one-to-one with the inputs in `datamodule.predict_input_dir` is provided as `datamodule.predict_true_dir`, then metrics and PDB annotation filepaths will also be reported in the output CSV to quantitatively and qualitatively describe how well GCPNet-EMA was able to improve upon AlphaFold's initial per-residue plDDT values.
0 commit comments