Skip to content

Commit d7af2c2

Browse files
authored
feat: load weights from safetensors and ckpt (leejet#101)
1 parent 47dd704 commit d7af2c2

28 files changed

+49183
-2418
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,6 @@ test/
88
*.bin
99
*.exe
1010
*.gguf
11+
*.log
1112
output.png
12-
models/*
13+
models/

CMakeLists.txt

+6-5
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ endif()
2525
#option(SD_BUILD_TESTS "sd: build tests" ${SD_STANDALONE})
2626
option(SD_BUILD_EXAMPLES "sd: build examples" ${SD_STANDALONE})
2727
option(SD_CUBLAS "sd: cuda backend" OFF)
28-
option(SD_FLASH_ATTN "sd: use flash attention for x4 less memory usage" OFF)
28+
option(SD_FLASH_ATTN "sd: use flash attention for x4 less memory usage" OFF)
2929
option(BUILD_SHARED_LIBS "sd: build shared libs" OFF)
3030
#option(SD_BUILD_SERVER "sd: build server example" ON)
3131

@@ -45,14 +45,15 @@ set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)
4545
# deps
4646
add_subdirectory(ggml)
4747

48+
add_subdirectory(thirdparty)
49+
4850
set(SD_LIB stable-diffusion)
4951

50-
add_library(${SD_LIB} stable-diffusion.h stable-diffusion.cpp)
51-
target_link_libraries(${SD_LIB} PUBLIC ggml)
52-
target_include_directories(${SD_LIB} PUBLIC .)
52+
add_library(${SD_LIB} stable-diffusion.h stable-diffusion.cpp model.h model.cpp util.h util.cpp)
53+
target_link_libraries(${SD_LIB} PUBLIC ggml zip)
54+
target_include_directories(${SD_LIB} PUBLIC . thirdparty)
5355
target_compile_features(${SD_LIB} PUBLIC cxx_std_11)
5456

55-
add_subdirectory(common)
5657

5758
if (SD_BUILD_EXAMPLES)
5859
add_subdirectory(examples)

README.md

+27-36
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,15 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
1010

1111
- Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp)
1212
- Super lightweight and without external dependencies.
13+
- SD1.x and SD2.x support
1314
- 16-bit, 32-bit float support
1415
- 4-bit, 5-bit and 8-bit integer quantization support
1516
- Accelerated memory-efficient CPU inference
1617
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
1718
- AVX, AVX2 and AVX512 support for x86 architectures
18-
- SD1.x and SD2.x support
1919
- Full CUDA backend for GPU acceleration, for now just for float16 and float32 models. There are some issues with quantized models and CUDA; it will be fixed in the future.
20+
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models.
21+
- No need to convert to `.ggml` or `.gguf` anymore!
2022
- Flash Attention for memory usage optimization (only cpu for now).
2123
- Original `txt2img` and `img2img` mode
2224
- Negative prompt
@@ -68,7 +70,7 @@ git submodule init
6870
git submodule update
6971
```
7072

71-
### Convert weights
73+
### Download weights
7274

7375
- download original weights(.ckpt or .safetensors). For example
7476
- Stable Diffusion v1.4 from https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
@@ -81,22 +83,6 @@ git submodule update
8183
# curl -L -O https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/v2-1_768-nonema-pruned.safetensors
8284
```
8385

84-
- convert weights to gguf model format
85-
86-
```shell
87-
./bin/convert sd-v1-4.ckpt -t f16
88-
```
89-
90-
### Quantization
91-
92-
You can specify the output model format using the `--type` or `-t` parameter
93-
94-
- `f16` for 16-bit floating-point
95-
- `f32` for 32-bit floating-point
96-
- `q8_0` for 8-bit integer quantization
97-
- `q5_0` or `q5_1` for 5-bit integer quantization
98-
- `q4_0` or `q4_1` for 4-bit integer quantization
99-
10086
### Build
10187

10288
#### Build from scratch
@@ -144,9 +130,11 @@ arguments:
144130
-t, --threads N number of threads to use during computation (default: -1).
145131
If threads <= 0, then threads will be set to the number of CPU physical cores
146132
-m, --model [MODEL] path to model
147-
--lora-model-dir [DIR] lora model directory
133+
--vae [VAE] path to vae
134+
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
135+
If not specified, the default is the type of the weight file. --lora-model-dir [DIR] lora model directory
148136
-i, --init-img [IMAGE] path to the input image, required by img2img
149-
-o, --output OUTPUT path to write result image to (default: .\output.png)
137+
-o, --output OUTPUT path to write result image to (default: ./output.png)
150138
-p, --prompt [PROMPT] the prompt to render
151139
-n, --negative-prompt PROMPT the negative prompt (default: "")
152140
--cfg-scale SCALE unconditional guidance scale: (default: 7.0)
@@ -164,10 +152,21 @@ arguments:
164152
-v, --verbose print extra info
165153
```
166154
155+
#### Quantization
156+
157+
You can specify the model weight type using the `--type` parameter. The weights are automatically converted when loading the model.
158+
159+
- `f16` for 16-bit floating-point
160+
- `f32` for 32-bit floating-point
161+
- `q8_0` for 8-bit integer quantization
162+
- `q5_0` or `q5_1` for 5-bit integer quantization
163+
- `q4_0` or `q4_1` for 4-bit integer quantization
164+
167165
#### txt2img example
168166
169-
```
170-
./bin/sd -m ../sd-v1-4-f16.gguf -p "a lovely cat"
167+
```sh
168+
./bin/sd -m ../models/sd-v1-4.ckpt -p "a lovely cat"
169+
# ./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"
171170
```
172171

173172
Using formats of different precisions will yield results of varying quality.
@@ -182,7 +181,7 @@ Using formats of different precisions will yield results of varying quality.
182181

183182

184183
```
185-
./bin/sd --mode img2img -m ../models/sd-v1-4-f16.gguf -p "cat with blue eyes" -i ./output.png -o ./img2img_output.png --strength 0.4
184+
./bin/sd --mode img2img -m ../models/sd-v1-4.ckpt -p "cat with blue eyes" -i ./output.png -o ./img2img_output.png --strength 0.4
186185
```
187186

188187
<p align="center">
@@ -191,24 +190,17 @@ Using formats of different precisions will yield results of varying quality.
191190

192191
#### with LoRA
193192

194-
- convert lora weights to gguf model format
195-
196-
```shell
197-
bin/convert [lora path] -t f16
198-
# For example, bin/convert marblesh.safetensors -t f16
199-
```
200-
201193
- You can specify the directory where the lora weights are stored via `--lora-model-dir`. If not specified, the default is the current working directory.
202194

203195
- LoRA is specified via prompt, just like [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora).
204196

205197
Here's a simple example:
206198

207199
```
208-
./bin/sd -m ../models/v1-5-pruned-emaonly-f16.gguf -p "a lovely cat<lora:marblesh:1>" --lora-model-dir ../models
200+
./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat<lora:marblesh:1>" --lora-model-dir ../models
209201
```
210202

211-
`../models/marblesh.gguf` will be applied to the model
203+
`../models/marblesh.safetensors` or `../models/marblesh.ckpt` will be applied to the model
212204

213205
#### LCM/LCM-LoRA
214206

@@ -219,7 +211,7 @@ Here's a simple example:
219211
Here's a simple example:
220212

221213
```
222-
./bin/sd -m ../models/v1-5-pruned-emaonly-f16.gguf -p "a lovely cat<lora:lcm-lora-sdv1-5:1>" --steps 4 --lora-model-dir ../models -v --cfg-scale 1
214+
./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat<lora:lcm-lora-sdv1-5:1>" --steps 4 --lora-model-dir ../models -v --cfg-scale 1
223215
```
224216

225217
| without LCM-LoRA (--cfg-scale 7) | with LCM-LoRA (--cfg-scale 1) |
@@ -240,14 +232,13 @@ docker build -t sd .
240232
```shell
241233
docker run -v /path/to/models:/models -v /path/to/output/:/output sd [args...]
242234
# For example
243-
# docker run -v ./models:/models -v ./build:/output sd -m /models/sd-v1-4-f16.gguf -p "a lovely cat" -v -o /output/output.png
235+
# docker run -v ./models:/models -v ./build:/output sd -m /models/sd-v1-4.ckpt -p "a lovely cat" -v -o /output/output.png
244236
```
245237

246-
## Memory/Disk Requirements
238+
## Memory Requirements
247239

248240
| precision | f32 | f16 |q8_0 |q5_0 |q5_1 |q4_0 |q4_1 |
249241
| ---- | ---- |---- |---- |---- |---- |---- |---- |
250-
| **Disk** | 2.7G | 2.0G | 1.7G | 1.6G | 1.6G | 1.5G | 1.5G |
251242
| **Memory** (txt2img - 512 x 512) | ~2.8G | ~2.3G | ~2.1G | ~2.0G | ~2.0G | ~2.0G | ~2.0G |
252243
| **Memory** (txt2img - 512 x 512) *with Flash Attention* | ~2.4G | ~1.9G | ~1.6G | ~1.5G | ~1.5G | ~1.5G | ~1.5G |
253244

common/CMakeLists.txt

-15
This file was deleted.

0 commit comments

Comments
 (0)