Skip to content

Commit 43a1f89

Browse files
committed
update readme and glove
moved to thirdparty
1 parent 69f2035 commit 43a1f89

File tree

7 files changed

+1748
-883
lines changed

7 files changed

+1748
-883
lines changed

License.txt

Lines changed: 675 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 38 additions & 316 deletions
Original file line numberDiff line numberDiff line change
@@ -1,321 +1,43 @@
1-
<p align="center">
2-
<img src="./assets/cat_with_sd_cpp_42.png" width="360x">
3-
</p>
4-
5-
# stable-diffusion.cpp
6-
7-
Inference of Stable Diffusion and Flux in pure C/C++
8-
9-
## Features
10-
11-
- Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp)
12-
- Super lightweight and without external dependencies
13-
- SD1.x, SD2.x, SDXL and SD3 support
14-
- !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
15-
- [Flux-dev/Flux-schnell Support](./docs/flux.md)
16-
17-
- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
18-
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
19-
- 16-bit, 32-bit float support
20-
- 2-bit, 3-bit, 4-bit, 5-bit and 8-bit integer quantization support
21-
- Accelerated memory-efficient CPU inference
22-
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
23-
- AVX, AVX2 and AVX512 support for x86 architectures
24-
- Full CUDA, Metal, Vulkan and SYCL backend for GPU acceleration.
25-
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models
26-
- No need to convert to `.ggml` or `.gguf` anymore!
27-
- Flash Attention for memory usage optimization (only cpu for now)
28-
- Original `txt2img` and `img2img` mode
29-
- Negative prompt
30-
- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now)
31-
- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
32-
- Latent Consistency Models support (LCM/LCM-LoRA)
33-
- Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd)
34-
- Upscale images generated with [ESRGAN](https://github.com/xinntao/Real-ESRGAN)
35-
- VAE tiling processing for reduce memory usage
36-
- Control Net support with SD 1.5
37-
- Sampling method
38-
- `Euler A`
39-
- `Euler`
40-
- `Heun`
41-
- `DPM2`
42-
- `DPM++ 2M`
43-
- [`DPM++ 2M v2`](https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/8457)
44-
- `DPM++ 2S a`
45-
- [`LCM`](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/13952)
46-
- Cross-platform reproducibility (`--rng cuda`, consistent with the `stable-diffusion-webui GPU RNG`)
47-
- Embedds generation parameters into png output as webui-compatible text string
48-
- Supported platforms
49-
- Linux
50-
- Mac OS
51-
- Windows
52-
- Android (via Termux)
53-
54-
### TODO
55-
56-
- [ ] More sampling methods
57-
- [ ] Make inference faster
58-
- The current implementation of ggml_conv_2d is slow and has high memory usage
59-
- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
60-
- [ ] Implement Inpainting support
61-
62-
## Usage
63-
64-
For most users, you can download the built executable program from the latest [release](https://github.com/leejet/stable-diffusion.cpp/releases/latest).
65-
If the built product does not meet your requirements, you can choose to build it manually.
66-
67-
### Get the Code
68-
69-
```
70-
git clone --recursive https://github.com/leejet/stable-diffusion.cpp
71-
cd stable-diffusion.cpp
72-
```
73-
74-
- If you have already cloned the repository, you can use the following command to update the repository to the latest code.
75-
76-
```
77-
cd stable-diffusion.cpp
78-
git pull origin master
79-
git submodule init
80-
git submodule update
81-
```
82-
83-
### Download weights
84-
85-
- download original weights(.ckpt or .safetensors). For example
86-
- Stable Diffusion v1.4 from https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
87-
- Stable Diffusion v1.5 from https://huggingface.co/runwayml/stable-diffusion-v1-5
88-
- Stable Diffuison v2.1 from https://huggingface.co/stabilityai/stable-diffusion-2-1
89-
- Stable Diffusion 3 2B from https://huggingface.co/stabilityai/stable-diffusion-3-medium
90-
91-
```shell
92-
curl -L -O https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
93-
# curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
94-
# curl -L -O https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-nonema-pruned.safetensors
95-
# curl -L -O https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors
96-
```
97-
98-
### Build
99-
100-
#### Build from scratch
101-
102-
```shell
103-
mkdir build
104-
cd build
105-
cmake ..
106-
cmake --build . --config Release
107-
```
108-
109-
##### Using OpenBLAS
110-
111-
```
112-
cmake .. -DGGML_OPENBLAS=ON
113-
cmake --build . --config Release
114-
```
115-
116-
##### Using CUBLAS
117-
118-
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads). Recommended to have at least 4 GB of VRAM.
119-
120-
```
121-
cmake .. -DSD_CUBLAS=ON
122-
cmake --build . --config Release
123-
```
124-
125-
##### Using HipBLAS
126-
This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure to have the ROCm toolkit installed.
127-
128-
Windows User Refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide.
129-
130-
```
131-
cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100
132-
cmake --build . --config Release
133-
```
134-
135-
136-
##### Using Metal
137-
138-
Using Metal makes the computation run on the GPU. Currently, there are some issues with Metal when performing operations on very large matrices, making it highly inefficient at the moment. Performance improvements are expected in the near future.
139-
140-
```
141-
cmake .. -DSD_METAL=ON
142-
cmake --build . --config Release
143-
```
144-
145-
##### Using Vulkan
146-
147-
Install Vulkan SDK from https://www.lunarg.com/vulkan-sdk/.
148-
149-
```
150-
cmake .. -DSD_VULKAN=ON
151-
cmake --build . --config Release
152-
```
153-
154-
##### Using SYCL
155-
156-
Using SYCL makes the computation run on the Intel GPU. Please make sure you have installed the related driver and [Intel® oneAPI Base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) before start. More details and steps can refer to [llama.cpp SYCL backend](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md#linux).
157-
158-
```
159-
# Export relevant ENV variables
160-
source /opt/intel/oneapi/setvars.sh
1+
# stable-diffusion.cpp : GUI of command line interface
1612

162-
# Option 1: Use FP32 (recommended for better performance in most cases)
163-
cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
164-
165-
# Option 2: Use FP16
166-
cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
167-
168-
cmake --build . --config Release
169-
```
170-
171-
Example of text2img by using SYCL backend:
172-
173-
- download `stable-diffusion` model weight, refer to [download-weight](#download-weights).
174-
175-
- run `./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors --cfg-scale 5 --steps 30 --sampling-method euler -H 1024 -W 1024 --seed 42 -p "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution"`
176-
177-
<p align="center">
178-
<img src="./assets/sycl_sd3_output.png" width="360x">
179-
</p>
180-
181-
182-
183-
##### Using Flash Attention
184-
185-
Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.
186-
187-
```
188-
cmake .. -DSD_FLASH_ATTN=ON
189-
cmake --build . --config Release
190-
```
191-
192-
### Run
193-
194-
```
195-
usage: ./bin/sd [arguments]
196-
197-
arguments:
198-
-h, --help show this help message and exit
199-
-M, --mode [MODEL] run mode (txt2img or img2img or convert, default: txt2img)
200-
-t, --threads N number of threads to use during computation (default: -1).
201-
If threads <= 0, then threads will be set to the number of CPU physical cores
202-
-m, --model [MODEL] path to full model
203-
--diffusion-model path to the standalone diffusion model
204-
--clip_l path to the clip-l text encoder
205-
--t5xxl path to the the t5xxl text encoder.
206-
--vae [VAE] path to vae
207-
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
208-
--control-net [CONTROL_PATH] path to control net model
209-
--embd-dir [EMBEDDING_PATH] path to embeddings.
210-
--stacked-id-embd-dir [DIR] path to PHOTOMAKER stacked id embeddings.
211-
--input-id-images-dir [DIR] path to PHOTOMAKER input id images dir.
212-
--normalize-input normalize PHOTOMAKER input id images
213-
--upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
214-
--upscale-repeats Run the ESRGAN upscaler this many times (default 1)
215-
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_k, q3_k, q4_k)
216-
If not specified, the default is the type of the weight file.
217-
--lora-model-dir [DIR] lora model directory
218-
-i, --init-img [IMAGE] path to the input image, required by img2img
219-
--control-image [IMAGE] path to image condition, control net
220-
-o, --output OUTPUT path to write result image to (default: ./output.png)
221-
-p, --prompt [PROMPT] the prompt to render
222-
-n, --negative-prompt PROMPT the negative prompt (default: "")
223-
--cfg-scale SCALE unconditional guidance scale: (default: 7.0)
224-
--strength STRENGTH strength for noising/unnoising (default: 0.75)
225-
--style-ratio STYLE-RATIO strength for keeping input identity (default: 20%)
226-
--control-strength STRENGTH strength to apply Control Net (default: 0.9)
227-
1.0 corresponds to full destruction of information in init image
228-
-H, --height H image height, in pixel space (default: 512)
229-
-W, --width W image width, in pixel space (default: 512)
230-
--sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm}
231-
sampling method (default: "euler_a")
232-
--steps STEPS number of sample steps (default: 20)
233-
--rng {std_default, cuda} RNG (default: cuda)
234-
-s SEED, --seed SEED RNG seed (default: 42, use random seed for < 0)
235-
-b, --batch-count COUNT number of images to generate.
236-
--schedule {discrete, karras, exponential, ays, gits} Denoiser sigma schedule (default: discrete)
237-
--clip-skip N ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
238-
<= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x
239-
--vae-tiling process vae in tiles to reduce memory usage
240-
--vae-on-cpu keep vae in cpu (for low vram)
241-
--clip-on-cpu keep clip in cpu (for low vram).
242-
--control-net-cpu keep controlnet in cpu (for low vram)
243-
--canny apply canny preprocessor (edge detection)
244-
--color Colors the logging tags according to level
245-
-v, --verbose print extra info
246-
```
247-
248-
#### txt2img example
249-
250-
```sh
251-
./bin/sd -m ../models/sd-v1-4.ckpt -p "a lovely cat"
252-
# ./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"
253-
# ./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
254-
# ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p 'a lovely cat holding a sign says \"Stable Diffusion CPP\"' --cfg-scale 4.5 --sampling-method euler -v
255-
# ./bin/sd --diffusion-model ../models/flux1-dev-q3_k.gguf --vae ../models/ae.sft --clip_l ../models/clip_l.safetensors --t5xxl ../models/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v
256-
```
257-
258-
Using formats of different precisions will yield results of varying quality.
259-
260-
| f32 | f16 |q8_0 |q5_0 |q5_1 |q4_0 |q4_1 |
261-
| ---- |---- |---- |---- |---- |---- |---- |
262-
| ![](./assets/f32.png) |![](./assets/f16.png) |![](./assets/q8_0.png) |![](./assets/q5_0.png) |![](./assets/q5_1.png) |![](./assets/q4_0.png) |![](./assets/q4_1.png) |
263-
264-
#### img2img example
265-
266-
- `./output.png` is the image generated from the above txt2img pipeline
267-
268-
269-
```
270-
./bin/sd --mode img2img -m ../models/sd-v1-4.ckpt -p "cat with blue eyes" -i ./output.png -o ./img2img_output.png --strength 0.4
271-
```
3+
This repository is a fork of [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp). It only adds a GUI interface to the executable generating examples. It will be updated according to the master repository developments (evolution of the parameters).
2724

2735
<p align="center">
274-
<img src="./assets/img2img_output.png" width="256x">
6+
<img src="./assets/sd-example.png" width="360x">
2757
</p>
2768

277-
## More Guides
278-
279-
- [LoRA](./docs/lora.md)
280-
- [LCM/LCM-LoRA](./docs/lcm.md)
281-
- [Using PhotoMaker to personalize image generation](./docs/photo_maker.md)
282-
- [Using ESRGAN to upscale results](./docs/esrgan.md)
283-
- [Using TAESD to faster decoding](./docs/taesd.md)
284-
- [Docker](./docs/docker.md)
285-
- [Quantization and GGUF](./docs/quantization_and_gguf.md)
286-
287-
## Bindings
288-
289-
These projects wrap `stable-diffusion.cpp` for easier use in other languages/frameworks.
290-
291-
* Golang: [seasonjs/stable-diffusion](https://github.com/seasonjs/stable-diffusion)
292-
* C#: [DarthAffe/StableDiffusion.NET](https://github.com/DarthAffe/StableDiffusion.NET)
293-
294-
## UIs
295-
296-
These projects use `stable-diffusion.cpp` as a backend for their image generation.
297-
298-
- [Jellybox](https://jellybox.com)
299-
300-
## Contributors
301-
302-
Thank you to all the people who have already contributed to stable-diffusion.cpp!
303-
304-
[![Contributors](https://contrib.rocks/image?repo=leejet/stable-diffusion.cpp)](https://github.com/leejet/stable-diffusion.cpp/graphs/contributors)
305-
306-
## Star History
307-
308-
[![Star History Chart](https://api.star-history.com/svg?repos=leejet/stable-diffusion.cpp&type=Date)](https://star-history.com/#leejet/stable-diffusion.cpp&Date)
309-
310-
## References
311-
312-
- [ggml](https://github.com/ggerganov/ggml)
313-
- [stable-diffusion](https://github.com/CompVis/stable-diffusion)
314-
- [sd3-ref](https://github.com/Stability-AI/sd3-ref)
315-
- [stable-diffusion-stability-ai](https://github.com/Stability-AI/stablediffusion)
316-
- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
317-
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI)
318-
- [k-diffusion](https://github.com/crowsonkb/k-diffusion)
319-
- [latent-consistency-model](https://github.com/luosiallen/latent-consistency-model)
320-
- [generative-models](https://github.com/Stability-AI/generative-models/)
321-
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker)
9+
- Only requires installation of Qt (5 or 6)
10+
11+
- To enable, use CMake options:
12+
13+
```cmake
14+
-DSD_BUILD_EXAMPLES=ON -DSD_EXAMPLES_GLOVE_GUI=ON
15+
```
16+
17+
then instead of CLI arguments use:
18+
19+
```sh
20+
sd -glove
21+
```
22+
23+
- Parameters saving as *json*, upon acceptance (*Ok* button):
24+
25+
- The parameters are saved automatically at the execution location
26+
- The parameters are also copied automatically to the directory of the <code>--output</code> path
27+
28+
- Parameters loading:
29+
30+
- The last used parameters are automatically loaded at launch
31+
- Parameters can be loaded
32+
- by using the *Load* button
33+
- or: <code>sd -glove 'path-to-parameters-file'</code>
34+
35+
- On Windows, if DLLs are missing, go to the <code>sd</code> executable directory and do:
36+
37+
```sh
38+
windeployqt.exe sd.exe
39+
```
40+
41+
## License
42+
43+
The interface is licensed under GPL-3.0, when using <code>-DSD_EXAMPLES_GLOVE_GUI=ON</code>. Otherwise, the MIT license of the master repository applies.

assets/sd-example.png

35.9 KB
Loading

examples/cli/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
set(TARGET sd)
22

33
if(SD_EXAMPLES_GLOVE_GUI)
4-
add_executable(${TARGET} main.cpp gui/main_glove.h gui/glove.h)
4+
add_executable(${TARGET} main.cpp main_glove.h ${CMAKE_SOURCE_DIR}/thirdparty/glove.h)
55
target_compile_definitions(${TARGET} PUBLIC -D SD_EXAMPLES_GLOVE_GUI)
66
target_link_libraries(${TARGET} PRIVATE stable-diffusion ${CMAKE_THREAD_LIBS_INIT} Qt${QT_VERSION}::Widgets)
77
if(MSVC)

examples/cli/main.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -674,7 +674,7 @@ void sd_log_cb(enum sd_log_level_t level, const char* log, void* data) {
674674
}
675675

676676
#ifdef SD_EXAMPLES_GLOVE_GUI
677-
#include "gui/main_glove.h"
677+
#include "main_glove.h"
678678
#endif
679679

680680
int main(int argc, char* argv[]) {
File renamed without changes.

0 commit comments

Comments
 (0)