|
| 1 | +# Stable Diffusion 3 Micro-Reference Implementation |
| 2 | + |
| 3 | +Inference-only tiny reference implementation of SD3. |
| 4 | + |
| 5 | +Contains code for the text encoders (OpenAI CLIP-L/14, OpenCLIP bigG, Google T5-XXL) (these models are all public), the VAE Decoder (similar to previous SD models, but 16-channels and no postquantconv step), and the core MM-DiT (entirely new). |
| 6 | + |
| 7 | +Everything you need to inference SD3 excluding the weights files. |
| 8 | + |
| 9 | +### Install |
| 10 | + |
| 11 | +```sh |
| 12 | +python3 -s -m venv venv |
| 13 | +source ./venv/bin/activate |
| 14 | +# or on windows: venv/scripts/activate |
| 15 | +python3 -s -m pip install -r requirements.txt |
| 16 | +``` |
| 17 | + |
| 18 | +### Test Usage |
| 19 | + |
| 20 | +```sh |
| 21 | +# Generate a cat on ref model with default settings |
| 22 | +python3 -s sd3_infer.py |
| 23 | +# Generate a 1024 cat on SD3-8B |
| 24 | +python3 -s sd3_infer.py --width 1024 --height 1024 --shift 3 --model models/sd3_8b_beta.safetensors --prompt "cute wallpaper art of a cat" |
| 25 | +``` |
| 26 | + |
| 27 | +Images will be output to `output.png` by default |
| 28 | + |
| 29 | +### File Guide |
| 30 | + |
| 31 | +- `sd3_infer.py` - entry point, review this for basic usage of diffusion model and the triple-tenc cat |
| 32 | +- `sd3_impls.py` - contains the wrapper around the MMDiT and the VAE |
| 33 | +- `other_impls.py` - contains the CLIP model, the T5 model, and some utilities |
| 34 | +- `mmdit.py` - contains the core of the MMDiT itself |
| 35 | +- folder `models` with the following files (download separately): |
| 36 | + - `clip_g.safetensors` (openclip bigG, same as SDXL, can grab a public copy) |
| 37 | + - `clip_l.safetensors` (OpenAI CLIP-L, same as SDXL, can grab a public copy) |
| 38 | + - `t5xxl.safetensors` (google T5-v1.1-XXL, can grab a public copy) |
| 39 | + - `sd3_beta.safetensors` (internal, private) |
| 40 | + |
| 41 | +### Legal |
| 42 | + |
| 43 | +Built by Alex Goodwin for Stability AI and private partners under NDA, heavily based on internal ComfyUI and SGM codebases. Uses some upstream logic from HuggingFace, Google, PyTorch. |
| 44 | + |
| 45 | +Do not redistribute. |
0 commit comments