Skip to content

Commit f8b4a07

Browse files
committed
add autoencoder training details, arxiv link and figures
1 parent 32a9661 commit f8b4a07

File tree

3 files changed

+45
-1
lines changed

3 files changed

+45
-1
lines changed

README.md

+45-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,23 @@
11
# Latent Diffusion Models
2+
[arXiv](https://arxiv.org/abs/2112.10752) | [BibTeX](#bibtex)
3+
4+
<p align="center">
5+
<img src=assets/results.gif />
6+
</p>
7+
8+
9+
10+
[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)<br/>
11+
[Robin Rombach](https://github.com/rromb)\*,
12+
[Andreas Blattmann](https://github.com/ablattmann)\*,
13+
[Dominik Lorenz](https://github.com/qp-qp)\,
14+
[Patrick Esser](https://github.com/pesser),
15+
[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>
16+
\* equal contribution
17+
18+
<p align="center">
19+
<img src=assets/modelfigure.png />
20+
</p>
221

322
## Requirements
423
A suitable [conda](https://conda.io/) environment named `ldm` can be created
@@ -31,12 +50,24 @@ conda activate ldm
3150
### Get the models
3251

3352
Running the following script downloads und extracts all available pretrained autoencoding models.
34-
3553
```shell script
3654
bash scripts/download_first_stages.sh
3755
```
3856

3957
The first stage models can then be found in `models/first_stage_models/<model_spec>`
58+
### Training autoencoder models
59+
60+
Configs for training a KL-regularized autoencoder on ImageNet are provided at `configs/autoencoder`.
61+
Training can be started by running
62+
```
63+
CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec> -t --gpus 0,
64+
```
65+
where `config_spec` is one of {`autoencoder_kl_8x8x64.yaml`(f=32, d=64), `autoencoder_kl_16x16x16.yaml`(f=16, d=16),
66+
`autoencoder_kl_32x32x4`(f=8, d=4), `autoencoder_kl_64x64x3`(f=4, d=3)}.
67+
68+
For training VQ-regularized models, see the [taming-transformers](https://github.com/CompVis/taming-transformers)
69+
repository.
70+
4071

4172
## Pretrained LDMs
4273
| Datset | Task | Model | FID | IS | Prec | Recall | Link | Comments
@@ -102,4 +133,17 @@ Thanks for open-sourcing!
102133
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
103134

104135

136+
## BibTeX
137+
138+
```
139+
@misc{rombach2021highresolution,
140+
title={High-Resolution Image Synthesis with Latent Diffusion Models},
141+
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
142+
year={2021},
143+
eprint={2112.10752},
144+
archivePrefix={arXiv},
145+
primaryClass={cs.CV}
146+
}
147+
```
148+
105149

assets/modelfigure.png

72 KB
Loading

assets/results.gif

9.39 MB
Loading

0 commit comments

Comments
 (0)