Simple Structure-Based Drug Design

This repository is the official implementation of SimpleSBDD models introduced in What Ails Generative Structure-based Drug Design: Expressivity is Too Little or Too Much? (AISTATS'25 oral).

Requirements

To prepare the environment (python 3.9): :

pip install -r requirements.txt

Data

CrossDocked2020 Dataset

Download the CrossDocked2020 dataset following the instructions in https://github.com/pengxingang/Pocket2Mol/blob/main/data/README.md and place it in the data folder.

Binding Affinity scoring model training data

We provide a CSV file of the training data for the scoring model described in Section 4.2 of the paper. It has four columns:

protein_path - path to a PDB file in the CrossDocked2020 dataset describing the protein pocket
ligand_smiles - SMILES string of the ligand
vina_score - binding affinity estimation made by Vina
split_name - whether an example comes from TRAINING or VALIDATION subset

MoFlow model weights

In order to use the novel drug generation and property optimization versions of the models, one must download model weights of the generative model MoFlow. We use the weights of the model trained on ZINC dataset. The instructions on how to download them can be found in the official repository: https://github.com/calvin-zcx/moflow/blob/master/README.md. The recommended directory for placing the weights is moflow/mflow/results

Training

To train the scoring model:

python train_scoring_model.py --protein_encoder_n_layers 3 --n_iter 10000 --output_checkpoint_name scoring_model_retrained

To train the center of mass predictor:

python train_com_model.py --hidden_dim 16 --n_layers 4 --n_iter 10000 --output_checkpoint_name com_model_retrained

Sampling

To generate ligands, run:

python sampling.py --train_test_pairs_file_path data/split_by_name.pt --cross_docked_data_dir data/crossdocked_pocket10 --n_samples_per_pocket 10 --model_type DR --experiment_name fastsbdd_dr

Possible values for model_type are [DR, ND, PO] standing for drug repurposing, novel drug generation and property optimization respectively.

Evaluation

To run evaluation, we create a separate environment (which needs to run python2.7) as in https://github.com/pengxingang/Pocket2Mol. After creating and activating the environment as described in that repository, run:

python evaluation/evaluate.py --results_pairs experiment_name_eval_input_local.pt  --out_dir experiment_name_results --exp_name experiment_name

where experiment_name_eval_input_local.pt was created by the sampling script described above.

Pre-trained Models

All pre-trained models are placed in the checkpoints folder.

Results

Our model achieves the following performance:

Reference

If you find this work useful, please consider citing

@inproceedings{
karczewski2025what,
title={What Ails Generative Structure-based Drug Design: Expressivity is Too Little or Too Much?},
author={Rafał Karczewski and Samuel Kaski and Markus Heinonen and Vikas K Garg},
booktitle={The 28th International Conference on Artificial Intelligence and Statistics},
year={2025},
url={https://openreview.net/forum?id=GHyBMTpiJg}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Structure-Based Drug Design

Requirements

Data

CrossDocked2020 Dataset

Binding Affinity scoring model training data

MoFlow model weights

Training

Sampling

Evaluation

Pre-trained Models

Results

Reference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
checkpoints		checkpoints
data		data
evaluation		evaluation
moflow		moflow
README.md		README.md
SimpleSBDD_no_ws.png		SimpleSBDD_no_ws.png
com_model.py		com_model.py
egnn_clean.py		egnn_clean.py
evaluation_utils.py		evaluation_utils.py
moflow_sampling.py		moflow_sampling.py
requirements.txt		requirements.txt
results_table.png		results_table.png
sampling.py		sampling.py
scoring_model.py		scoring_model.py
train_com_model.py		train_com_model.py
train_scoring_model.py		train_scoring_model.py
utils.py		utils.py

rafalkarczewski/SimpleSBDD

Folders and files

Latest commit

History

Repository files navigation

Simple Structure-Based Drug Design

Requirements

Data

CrossDocked2020 Dataset

Binding Affinity scoring model training data

MoFlow model weights

Training

Sampling

Evaluation

Pre-trained Models

Results

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages