12
12
13
13
<!-- [](https://papers.nips.cc/paper/2020) -->
14
14
15
- [ ![ Data DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.14660031 .svg )] ( https://doi.org/10.5281/zenodo.14660031 )
15
+ [ ![ Data DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.15066450 .svg )] ( https://doi.org/10.5281/zenodo.15066450 )
16
16
17
17
<img src =" ./img/FlowDock.png " width =" 600 " >
18
18
@@ -76,6 +76,7 @@ cd FlowDock
76
76
mamba env create -f environments/flowdock_environment.yaml
77
77
conda activate FlowDock # NOTE: one still needs to use `conda` to (de)activate environments
78
78
pip3 install -e . # install local project as package
79
+ pip3 install prody==2.4.1 --no-dependencies # install ProDy without NumPy dependency
79
80
```
80
81
81
82
Download checkpoints
91
92
92
93
``` bash
93
94
# pretrained FlowDock weights
94
- wget https://zenodo.org/records/14660031 /files/flowdock_checkpoints.tar.gz
95
+ wget https://zenodo.org/records/15066450 /files/flowdock_checkpoints.tar.gz
95
96
tar -xzf flowdock_checkpoints.tar.gz
96
97
rm flowdock_checkpoints.tar.gz
97
98
```
@@ -105,19 +106,19 @@ tar -xzf flowdock_data_cache.tar.gz
105
106
rm flowdock_data_cache.tar.gz
106
107
107
108
# cached data for PDBBind, Binding MOAD, DockGen, and the PDB-based van der Mers (vdM) dataset
108
- wget https://zenodo.org/records/14660031 /files/flowdock_pdbbind_data.tar.gz
109
+ wget https://zenodo.org/records/15066450 /files/flowdock_pdbbind_data.tar.gz
109
110
tar -xzf flowdock_pdbbind_data.tar.gz
110
111
rm flowdock_pdbbind_data.tar.gz
111
112
112
- wget https://zenodo.org/records/14660031 /files/flowdock_moad_data.tar.gz
113
+ wget https://zenodo.org/records/15066450 /files/flowdock_moad_data.tar.gz
113
114
tar -xzf flowdock_moad_data.tar.gz
114
115
rm flowdock_moad_data.tar.gz
115
116
116
- wget https://zenodo.org/records/14660031 /files/flowdock_dockgen_data.tar.gz
117
+ wget https://zenodo.org/records/15066450 /files/flowdock_dockgen_data.tar.gz
117
118
tar -xzf flowdock_dockgen_data.tar.gz
118
119
rm flowdock_dockgen_data.tar.gz
119
120
120
- wget https://zenodo.org/records/14660031 /files/flowdock_pdbsidechain_data.tar.gz
121
+ wget https://zenodo.org/records/15066450 /files/flowdock_pdbsidechain_data.tar.gz
121
122
tar -xzf flowdock_pdbsidechain_data.tar.gz
122
123
rm flowdock_pdbsidechain_data.tar.gz
123
124
```
@@ -129,7 +130,7 @@ rm flowdock_pdbsidechain_data.tar.gz
129
130
<details >
130
131
131
132
** NOTE:** The following steps (besides downloading PDBBind and Binding MOAD's PDB files) are only necessary if one wants to fully process each of the following datasets manually.
132
- Otherwise, preprocessed versions of each dataset can be found on [ Zenodo] ( https://zenodo.org/records/14660031 ) .
133
+ Otherwise, preprocessed versions of each dataset can be found on [ Zenodo] ( https://zenodo.org/records/15066450 ) .
133
134
134
135
Download data
135
136
@@ -159,6 +160,16 @@ mv pdb_2021aug02/ pdbsidechain/
159
160
cd ../
160
161
```
161
162
163
+ Lastly, to finetune ` FlowDock ` using the ` PLINDER ` dataset, one must first prepare this data for training
164
+
165
+ ``` bash
166
+ # fetch PLINDER data (NOTE: requires ~1 hour to download and ~750G of storage)
167
+ export PLINDER_MOUNT=" $( pwd) /data/PLINDER"
168
+ mkdir -p " $PLINDER_MOUNT " # create the directory if it doesn't exist
169
+
170
+ plinder_download -y
171
+ ```
172
+
162
173
### Generating ESM2 embeddings for each protein (optional, cached input data available on SharePoint)
163
174
164
175
To generate the ESM2 embeddings for the protein inputs,
@@ -260,10 +271,10 @@ python flowdock/train.py experiment=flowdock_fm
260
271
python flowdock/train.py experiment=flowdock_fm trainer.max_epochs=20 data.batch_size=8
261
272
```
262
273
263
- For example, override parameters to finetune ` FlowDock ` 's pretrained weights using a new dataset
274
+ For example, override parameters to finetune ` FlowDock ` 's pretrained weights using a new dataset such as [ PLINDER ] ( https://www.plinder.sh/ )
264
275
265
276
``` bash
266
- python flowdock/train.py experiment=flowdock_fm data=my_new_datamodule ckpt_path=checkpoints/esmfold_prior_paper_weights.ckpt
277
+ python flowdock/train.py experiment=flowdock_fm data=plinder ckpt_path=checkpoints/esmfold_prior_paper_weights.ckpt
267
278
```
268
279
269
280
</details >
@@ -277,7 +288,7 @@ To reproduce `FlowDock`'s evaluation results for structure prediction, please re
277
288
To reproduce ` FlowDock ` 's evaluation results for binding affinity prediction using the PDBBind dataset
278
289
279
290
``` bash
280
- python flowdock/eval.py data.test_datasets=[pdbbind] ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt trainer=gpu
291
+ python flowdock/eval.py data.test_datasets=[pdbbind] ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt trainer=gpu
281
292
... # re-run two more times to gather triplicate results
282
293
```
283
294
@@ -291,47 +302,55 @@ Download baseline method predictions and results
291
302
292
303
``` bash
293
304
# cached predictions and evaluation metrics for reproducing structure prediction paper results
294
- wget https://zenodo.org/records/14660031 /files/alphafold3_baseline_method_predictions.tar.gz
305
+ wget https://zenodo.org/records/15066450 /files/alphafold3_baseline_method_predictions.tar.gz
295
306
tar -xzf alphafold3_baseline_method_predictions.tar.gz
296
307
rm alphafold3_baseline_method_predictions.tar.gz
297
308
298
- wget https://zenodo.org/records/14660031 /files/chai_baseline_method_predictions.tar.gz
309
+ wget https://zenodo.org/records/15066450 /files/chai_baseline_method_predictions.tar.gz
299
310
tar -xzf chai_baseline_method_predictions.tar.gz
300
311
rm chai_baseline_method_predictions.tar.gz
301
312
302
- wget https://zenodo.org/records/14660031 /files/diffdock_baseline_method_predictions.tar.gz
313
+ wget https://zenodo.org/records/15066450 /files/diffdock_baseline_method_predictions.tar.gz
303
314
tar -xzf diffdock_baseline_method_predictions.tar.gz
304
315
rm diffdock_baseline_method_predictions.tar.gz
305
316
306
- wget https://zenodo.org/records/14660031 /files/dynamicbind_baseline_method_predictions.tar.gz
317
+ wget https://zenodo.org/records/15066450 /files/dynamicbind_baseline_method_predictions.tar.gz
307
318
tar -xzf dynamicbind_baseline_method_predictions.tar.gz
308
319
rm dynamicbind_baseline_method_predictions.tar.gz
309
320
310
- wget https://zenodo.org/records/14660031 /files/flowdock_baseline_method_predictions.tar.gz
321
+ wget https://zenodo.org/records/15066450 /files/flowdock_baseline_method_predictions.tar.gz
311
322
tar -xzf flowdock_baseline_method_predictions.tar.gz
312
323
rm flowdock_baseline_method_predictions.tar.gz
313
324
314
- wget https://zenodo.org/records/14660031 /files/flowdock_aft_baseline_method_predictions.tar.gz
325
+ wget https://zenodo.org/records/15066450 /files/flowdock_aft_baseline_method_predictions.tar.gz
315
326
tar -xzf flowdock_aft_baseline_method_predictions.tar.gz
316
327
rm flowdock_aft_baseline_method_predictions.tar.gz
317
328
318
- wget https://zenodo.org/records/14660031/files/flowdock_esmfold_baseline_method_predictions.tar.gz
329
+ wget https://zenodo.org/records/15066450/files/flowdock_pft_baseline_method_predictions.tar.gz
330
+ tar -xzf flowdock_pft_baseline_method_predictions.tar.gz
331
+ rm flowdock_pft_baseline_method_predictions.tar.gz
332
+
333
+ wget https://zenodo.org/records/15066450/files/flowdock_esmfold_baseline_method_predictions.tar.gz
319
334
tar -xzf flowdock_esmfold_baseline_method_predictions.tar.gz
320
335
rm flowdock_esmfold_baseline_method_predictions.tar.gz
321
336
322
- wget https://zenodo.org/records/14660031/files/flowdock_hp_baseline_method_predictions.tar.gz
337
+ wget https://zenodo.org/records/15066450/files/flowdock_chai_baseline_method_predictions.tar.gz
338
+ tar -xzf flowdock_chai_baseline_method_predictions.tar.gz
339
+ rm flowdock_chai_baseline_method_predictions.tar.gz
340
+
341
+ wget https://zenodo.org/records/15066450/files/flowdock_hp_baseline_method_predictions.tar.gz
323
342
tar -xzf flowdock_hp_baseline_method_predictions.tar.gz
324
343
rm flowdock_hp_baseline_method_predictions.tar.gz
325
344
326
- wget https://zenodo.org/records/14660031 /files/neuralplexer_baseline_method_predictions.tar.gz
345
+ wget https://zenodo.org/records/15066450 /files/neuralplexer_baseline_method_predictions.tar.gz
327
346
tar -xzf neuralplexer_baseline_method_predictions.tar.gz
328
347
rm neuralplexer_baseline_method_predictions.tar.gz
329
348
330
- wget https://zenodo.org/records/14660031 /files/vina_p2rank_baseline_method_predictions.tar.gz
349
+ wget https://zenodo.org/records/15066450 /files/vina_p2rank_baseline_method_predictions.tar.gz
331
350
tar -xzf vina_p2rank_baseline_method_predictions.tar.gz
332
351
rm vina_p2rank_baseline_method_predictions.tar.gz
333
352
334
- wget https://zenodo.org/records/14660031 /files/rfaa_baseline_method_predictions.tar.gz
353
+ wget https://zenodo.org/records/15066450 /files/rfaa_baseline_method_predictions.tar.gz
335
354
tar -xzf rfaa_baseline_method_predictions.tar.gz
336
355
rm rfaa_baseline_method_predictions.tar.gz
337
356
```
@@ -353,13 +372,13 @@ jupyter notebook notebooks/casp16_binding_affinity_prediction_results_plotting.i
353
372
For example, generate new protein-ligand complexes for a pair of protein sequence and ligand SMILES strings such as those of the PDBBind 2020 test target ` 6i67 `
354
373
355
374
``` bash
356
- python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' YNKIVHLLVAEPEKIYAMPDPTVPDSDIKALTTLCDLADRELVVIIGWAKHIPGFSTLSLADQMSLLQSAWMEILILGVVYRSLFEDELVYADDYIMDEDQSKLAGLLDLNNAILQLVKKYKSMKLEKEEFVTLKAIALANSDSMHIEDVEAVQKLQDVLHEALQDYEAGQHMEDPRRAGKMLMTLPLLRQTSTKAVQHFYNKLEGKVPMHKLFLEMLEAKV' input_ligand=' "c1cc2c(cc1O)CCCC2"' input_template=data/pdbbind/pdbbind_holo_aligned_esmfold_structures/6i67_holo_aligned_esmfold_protein.pdb sample_id=' 6i67' out_path=' ./6i67_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
375
+ python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' YNKIVHLLVAEPEKIYAMPDPTVPDSDIKALTTLCDLADRELVVIIGWAKHIPGFSTLSLADQMSLLQSAWMEILILGVVYRSLFEDELVYADDYIMDEDQSKLAGLLDLNNAILQLVKKYKSMKLEKEEFVTLKAIALANSDSMHIEDVEAVQKLQDVLHEALQDYEAGQHMEDPRRAGKMLMTLPLLRQTSTKAVQHFYNKLEGKVPMHKLFLEMLEAKV' input_ligand=' "c1cc2c(cc1O)CCCC2"' input_template=data/pdbbind/pdbbind_holo_aligned_esmfold_structures/6i67_holo_aligned_esmfold_protein.pdb sample_id=' 6i67' out_path=' ./6i67_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
357
376
```
358
377
359
378
Or, for example, generate new protein-ligand complexes for pairs of protein sequences and (multi-)ligand SMILES strings (delimited via ` | ` ) such as those of the CASP15 target ` T1152 `
360
379
361
380
``` bash
362
- python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIPN' input_ligand=' "CC(=O)NC1C(O)OC(CO)C(OC2OC(CO)C(OC3OC(CO)C(O)C(O)C3NC(C)=O)C(O)C2NC(C)=O)C1O"' input_template=data/test_cases/predicted_structures/T1152.pdb sample_id=' T1152' out_path=' ./T1152_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
381
+ python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling input_receptor=' MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIP|MYTVKPGDTMWKIAVKYQIGISEIIAANPQIKNPNLIYPGQKINIPN' input_ligand=' "CC(=O)NC1C(O)OC(CO)C(OC2OC(CO)C(OC3OC(CO)C(O)C(O)C3NC(C)=O)C(O)C2NC(C)=O)C1O"' input_template=data/test_cases/predicted_structures/T1152.pdb sample_id=' T1152' out_path=' ./T1152_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=true auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
363
382
```
364
383
365
384
If you do not already have a template protein structure available for your target of interest, set ` input_template=null ` to instead have the sampling script predict the ESMFold structure of your provided ` input_protein ` sequence before running the sampling pipeline. For more information regarding the input arguments available for sampling, please refer to the config at ` configs/sample.yaml ` .
@@ -369,7 +388,7 @@ If you do not already have a template protein structure available for your targe
369
388
For instance, one can perform batched prediction as follows:
370
389
371
390
``` bash
372
- python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights_EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling csv_path=' ./data/test_cases/prediction_inputs/flowdock_batched_inputs.csv' out_path=' ./T1152_batch_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=false auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
391
+ python flowdock/sample.py ckpt_path=checkpoints/esmfold_prior_paper_weights-EMA .ckpt model.cfg.prior_type=esmfold sampling_task=batched_structure_sampling csv_path=' ./data/test_cases/prediction_inputs/flowdock_batched_inputs.csv' out_path=' ./T1152_batch_sampled_structures/' n_samples=5 chunk_size=5 num_steps=40 sampler=VDODE sampler_eta=1.0 start_time=' 1.0' use_template=true separate_pdb=true visualize_sample_trajectories=false auxiliary_estimation_only=false esmfold_chunk_size=null trainer=gpu
373
392
```
374
393
375
394
</details >
0 commit comments