piallai
diff --git a/‎License.txt
Lines changed: 675 additions & 0 deletions b/‎License.txt
Lines changed: 675 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 38 additions & 316 deletions b/‎README.md
Lines changed: 38 additions & 316 deletions
diff --git a/‎assets/sd-example.png
35.9 KB b/‎assets/sd-example.png
35.9 KB
diff --git a/‎examples/cli/CMakeLists.txt
Lines changed: 1 addition & 1 deletion b/‎examples/cli/CMakeLists.txt
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/cli/main.cpp
Lines changed: 1 addition & 1 deletion b/‎examples/cli/main.cpp
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/cli/gui/main_glove.h renamed to ‎examples/cli/main_glove.h b/‎examples/cli/gui/main_glove.h renamed to ‎examples/cli/main_glove.h
@@ -1,321 +1,43 @@
-<p align="center">
-  <img src="./assets/cat_with_sd_cpp_42.png" width="360x">
-</p>
-
-# stable-diffusion.cpp
-
-Inference of Stable Diffusion and Flux in pure C/C++
-
-## Features
-
-- Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp)
-- Super lightweight and without external dependencies
-- SD1.x, SD2.x, SDXL and SD3 support
-    - !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).
-- [Flux-dev/Flux-schnell Support](./docs/flux.md)
-
-- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
-- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
-- 16-bit, 32-bit float support
-- 2-bit, 3-bit, 4-bit, 5-bit and 8-bit integer quantization support
-- Accelerated memory-efficient CPU inference
-    - Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
-- AVX, AVX2 and AVX512 support for x86 architectures
-- Full CUDA, Metal, Vulkan and SYCL backend for GPU acceleration.
-- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models
-    - No need to convert to `.ggml` or `.gguf` anymore!
-- Flash Attention for memory usage optimization (only cpu for now)
-- Original `txt2img` and `img2img` mode
-- Negative prompt
-- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now)
-- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
-- Latent Consistency Models support (LCM/LCM-LoRA)
-- Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd)
-- Upscale images generated with [ESRGAN](https://github.com/xinntao/Real-ESRGAN)
-- VAE tiling processing for reduce memory usage
-- Control Net support with SD 1.5
-- Sampling method
-    - `Euler A`
-    - `Euler`
-    - `Heun`
-    - `DPM2`
-    - `DPM++ 2M`
-    - [`DPM++ 2M v2`](https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/8457)
-    - `DPM++ 2S a`
-    - [`LCM`](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/13952)
-- Cross-platform reproducibility (`--rng cuda`, consistent with the `stable-diffusion-webui GPU RNG`)
-- Embedds generation parameters into png output as webui-compatible text string
-- Supported platforms
-    - Linux
-    - Mac OS
-    - Windows
-    - Android (via Termux)
-
-### TODO
-
-- [ ] More sampling methods
-- [ ] Make inference faster
-    - The current implementation of ggml_conv_2d is slow and has high memory usage
-- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
-- [ ] Implement Inpainting support
-
-## Usage
-
-For most users, you can download the built executable program from the latest [release](https://github.com/leejet/stable-diffusion.cpp/releases/latest).
-If the built product does not meet your requirements, you can choose to build it manually.
-
-### Get the Code
-
-```
-git clone --recursive https://github.com/leejet/stable-diffusion.cpp
-cd stable-diffusion.cpp
-```
-
-- If you have already cloned the repository, you can use the following command to update the repository to the latest code.
-
-```
-cd stable-diffusion.cpp
-git pull origin master
-git submodule init
-git submodule update
-```
-
-### Download weights
-
-- download original weights(.ckpt or .safetensors). For example
-    - Stable Diffusion v1.4 from https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
-    - Stable Diffusion v1.5 from https://huggingface.co/runwayml/stable-diffusion-v1-5
-    - Stable Diffuison v2.1 from https://huggingface.co/stabilityai/stable-diffusion-2-1
-    - Stable Diffusion 3 2B from https://huggingface.co/stabilityai/stable-diffusion-3-medium
-
-    ```shell
-    curl -L -O https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
-    # curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
-    # curl -L -O https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-nonema-pruned.safetensors
-    # curl -L -O https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium_incl_clips_t5xxlfp16.safetensors
-    ```
-
-### Build
-
-#### Build from scratch
-
-```shell
-mkdir build
-cd build
-cmake ..
-cmake --build . --config Release
-```
-
-##### Using OpenBLAS
-
-```
-cmake .. -DGGML_OPENBLAS=ON
-cmake --build . --config Release
-```
-
-##### Using CUBLAS
-
-This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads). Recommended to have at least 4 GB of VRAM.
-
-```
-cmake .. -DSD_CUBLAS=ON
-cmake --build . --config Release
-```
-
-##### Using HipBLAS
-This provides BLAS acceleration using the ROCm cores of your AMD GPU. Make sure to have the ROCm toolkit installed.
-
-Windows User Refer to [docs/hipBLAS_on_Windows.md](docs%2FhipBLAS_on_Windows.md) for a comprehensive guide.
-
-```
-cmake .. -G "Ninja" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DSD_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=gfx1100
-cmake --build . --config Release
-```
-
-
-##### Using Metal
-
-Using Metal makes the computation run on the GPU. Currently, there are some issues with Metal when performing operations on very large matrices, making it highly inefficient at the moment. Performance improvements are expected in the near future.
-
-```
-cmake .. -DSD_METAL=ON
-cmake --build . --config Release
-```
-
-##### Using Vulkan
-
-Install Vulkan SDK from https://www.lunarg.com/vulkan-sdk/.
-
-```
-cmake .. -DSD_VULKAN=ON
-cmake --build . --config Release
-```
-
-##### Using SYCL
-
-Using SYCL makes the computation run on the Intel GPU. Please make sure you have installed the related driver and [Intel® oneAPI Base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) before start. More details and steps can refer to [llama.cpp SYCL backend](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md#linux).
-
-```
-# Export relevant ENV variables
-source /opt/intel/oneapi/setvars.sh
+# stable-diffusion.cpp : GUI of command line interface
 
-# Option 1: Use FP32 (recommended for better performance in most cases)
-cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
-
-# Option 2: Use FP16
-cmake .. -DSD_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
-
-cmake --build . --config Release
-```
-
-Example of text2img by using SYCL backend:
-
-- download `stable-diffusion` model weight, refer to [download-weight](#download-weights).
-
-- run `./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors --cfg-scale 5 --steps 30 --sampling-method euler  -H 1024 -W 1024 --seed 42 -p "fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details, 4k resolution"`
-
-<p align="center">
-  <img src="./assets/sycl_sd3_output.png" width="360x">
-</p>
-
-
-
-##### Using Flash Attention
-
-Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.
-
-```
-cmake .. -DSD_FLASH_ATTN=ON
-cmake --build . --config Release
-```
-
-### Run
-
-```
-usage: ./bin/sd [arguments]
-
-arguments:
-  -h, --help                         show this help message and exit
-  -M, --mode [MODEL]                 run mode (txt2img or img2img or convert, default: txt2img)
-  -t, --threads N                    number of threads to use during computation (default: -1).
-                                     If threads <= 0, then threads will be set to the number of CPU physical cores
-  -m, --model [MODEL]                path to full model
-  --diffusion-model                  path to the standalone diffusion model
-  --clip_l                           path to the clip-l text encoder
-  --t5xxl                            path to the the t5xxl text encoder.
-  --vae [VAE]                        path to vae
-  --taesd [TAESD_PATH]               path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
-  --control-net [CONTROL_PATH]       path to control net model
-  --embd-dir [EMBEDDING_PATH]        path to embeddings.
-  --stacked-id-embd-dir [DIR]        path to PHOTOMAKER stacked id embeddings.
-  --input-id-images-dir [DIR]        path to PHOTOMAKER input id images dir.
-  --normalize-input                  normalize PHOTOMAKER input id images
-  --upscale-model [ESRGAN_PATH]      path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
-  --upscale-repeats                  Run the ESRGAN upscaler this many times (default 1)
-  --type [TYPE]                      weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_k, q3_k, q4_k)
-                                     If not specified, the default is the type of the weight file.
-  --lora-model-dir [DIR]             lora model directory
-  -i, --init-img [IMAGE]             path to the input image, required by img2img
-  --control-image [IMAGE]            path to image condition, control net
-  -o, --output OUTPUT                path to write result image to (default: ./output.png)
-  -p, --prompt [PROMPT]              the prompt to render
-  -n, --negative-prompt PROMPT       the negative prompt (default: "")
-  --cfg-scale SCALE                  unconditional guidance scale: (default: 7.0)
-  --strength STRENGTH                strength for noising/unnoising (default: 0.75)
-  --style-ratio STYLE-RATIO          strength for keeping input identity (default: 20%)
-  --control-strength STRENGTH        strength to apply Control Net (default: 0.9)
-                                     1.0 corresponds to full destruction of information in init image
-  -H, --height H                     image height, in pixel space (default: 512)
-  -W, --width W                      image width, in pixel space (default: 512)
-  --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm}
-                                     sampling method (default: "euler_a")
-  --steps  STEPS                     number of sample steps (default: 20)
-  --rng {std_default, cuda}          RNG (default: cuda)
-  -s SEED, --seed SEED               RNG seed (default: 42, use random seed for < 0)
-  -b, --batch-count COUNT            number of images to generate.
-  --schedule {discrete, karras, exponential, ays, gits} Denoiser sigma schedule (default: discrete)
-  --clip-skip N                      ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1)
-                                     <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x
-  --vae-tiling                       process vae in tiles to reduce memory usage
-  --vae-on-cpu                       keep vae in cpu (for low vram)
-  --clip-on-cpu                      keep clip in cpu (for low vram).
-  --control-net-cpu                  keep controlnet in cpu (for low vram)
-  --canny                            apply canny preprocessor (edge detection)
-  --color                            Colors the logging tags according to level
-  -v, --verbose                      print extra info
-```
-
-#### txt2img example
-
-```sh
-./bin/sd -m ../models/sd-v1-4.ckpt -p "a lovely cat"
-# ./bin/sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"
-# ./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
-# ./bin/sd -m ../models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p 'a lovely cat holding a sign says \"Stable Diffusion CPP\"' --cfg-scale 4.5 --sampling-method euler -v
-# ./bin/sd --diffusion-model  ../models/flux1-dev-q3_k.gguf --vae ../models/ae.sft --clip_l ../models/clip_l.safetensors --t5xxl ../models/t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v
-```
-
-Using formats of different precisions will yield results of varying quality.
-
-| f32  | f16  |q8_0  |q5_0  |q5_1  |q4_0  |q4_1  |
-| ----  |----  |----  |----  |----  |----  |----  |
-| ![](./assets/f32.png) |![](./assets/f16.png) |![](./assets/q8_0.png) |![](./assets/q5_0.png) |![](./assets/q5_1.png) |![](./assets/q4_0.png) |![](./assets/q4_1.png) |
-
-#### img2img example
-
-- `./output.png` is the image generated from the above txt2img pipeline
-
-
-```
-./bin/sd --mode img2img -m ../models/sd-v1-4.ckpt -p "cat with blue eyes" -i ./output.png -o ./img2img_output.png --strength 0.4
-```
+This repository is a fork of [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp). It only adds a GUI interface to the executable generating examples. It will be updated according to the master repository developments (evolution of the parameters).
 
 <p align="center">
-  <img src="./assets/img2img_output.png" width="256x">
+  <img src="./assets/sd-example.png" width="360x">
 </p>
 
-## More Guides
-
-- [LoRA](./docs/lora.md)
-- [LCM/LCM-LoRA](./docs/lcm.md)
-- [Using PhotoMaker to personalize image generation](./docs/photo_maker.md)
-- [Using ESRGAN to upscale results](./docs/esrgan.md)
-- [Using TAESD to faster decoding](./docs/taesd.md)
-- [Docker](./docs/docker.md)
-- [Quantization and GGUF](./docs/quantization_and_gguf.md)
-
-## Bindings
-
-These projects wrap `stable-diffusion.cpp` for easier use in other languages/frameworks.
-
-* Golang: [seasonjs/stable-diffusion](https://github.com/seasonjs/stable-diffusion)
-* C#: [DarthAffe/StableDiffusion.NET](https://github.com/DarthAffe/StableDiffusion.NET)
-
-## UIs
-
-These projects use `stable-diffusion.cpp` as a backend for their image generation.
-
-- [Jellybox](https://jellybox.com)
-
-## Contributors
-
-Thank you to all the people who have already contributed to stable-diffusion.cpp!
-
-[![Contributors](https://contrib.rocks/image?repo=leejet/stable-diffusion.cpp)](https://github.com/leejet/stable-diffusion.cpp/graphs/contributors)
-
-## Star History
-
-[![Star History Chart](https://api.star-history.com/svg?repos=leejet/stable-diffusion.cpp&type=Date)](https://star-history.com/#leejet/stable-diffusion.cpp&Date)
-
-## References
-
-- [ggml](https://github.com/ggerganov/ggml)
-- [stable-diffusion](https://github.com/CompVis/stable-diffusion)
-- [sd3-ref](https://github.com/Stability-AI/sd3-ref)
-- [stable-diffusion-stability-ai](https://github.com/Stability-AI/stablediffusion)
-- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
-- [ComfyUI](https://github.com/comfyanonymous/ComfyUI)
-- [k-diffusion](https://github.com/crowsonkb/k-diffusion)
-- [latent-consistency-model](https://github.com/luosiallen/latent-consistency-model)
-- [generative-models](https://github.com/Stability-AI/generative-models/)
-- [PhotoMaker](https://github.com/TencentARC/PhotoMaker)
+- Only requires installation of Qt (5 or 6)
+
+- To enable, use CMake options:
+  
+  ```cmake
+  -DSD_BUILD_EXAMPLES=ON -DSD_EXAMPLES_GLOVE_GUI=ON
+  ```
+  
+   then instead of CLI arguments use:
+  
+  ```sh
+  sd -glove
+  ```
+
+- Parameters saving as *json*, upon acceptance (*Ok* button):
+  
+  - The parameters are saved automatically at the execution location
+  - The parameters are also copied automatically to the directory of the <code>--output</code> path
+
+- Parameters loading:
+  
+  - The last used parameters are automatically loaded at launch
+  - Parameters can be loaded
+    - by using the *Load* button
+    - or: <code>sd -glove 'path-to-parameters-file'</code>
+
+- On Windows, if DLLs are missing, go to the <code>sd</code> executable directory and do:
+  
+  ```sh
+  windeployqt.exe sd.exe
+  ```
+
+## License
+
+The interface is licensed under GPL-3.0, when using <code>-DSD_EXAMPLES_GLOVE_GUI=ON</code>. Otherwise, the MIT license of the master repository applies.
@@ -1,7 +1,7 @@
 set(TARGET sd)
 
 if(SD_EXAMPLES_GLOVE_GUI)
-add_executable(${TARGET} main.cpp gui/main_glove.h gui/glove.h)
+add_executable(${TARGET} main.cpp main_glove.h ${CMAKE_SOURCE_DIR}/thirdparty/glove.h)
 target_compile_definitions(${TARGET} PUBLIC -D SD_EXAMPLES_GLOVE_GUI)
 target_link_libraries(${TARGET} PRIVATE stable-diffusion ${CMAKE_THREAD_LIBS_INIT} Qt${QT_VERSION}::Widgets)
 if(MSVC)
 
@@ -674,7 +674,7 @@ void sd_log_cb(enum sd_log_level_t level, const char* log, void* data) {
 }
 
 #ifdef SD_EXAMPLES_GLOVE_GUI
-#include "gui/main_glove.h"
+#include "main_glove.h"
 #endif
 
 int main(int argc, char* argv[]) {
Original file line number	Diff line number	Diff line change
`@@ -674,7 +674,7 @@ void sd_log_cb(enum sd_log_level_t level, const char* log, void* data) {`
`674`	`674`	`}`
`675`	`675`
`676`	`676`	`#ifdef SD_EXAMPLES_GLOVE_GUI`
`677`		`-#include "gui/main_glove.h"`
	`677`	`+#include "main_glove.h"`
`678`	`678`	`#endif`
`679`	`679`
`680`	`680`	`int main(int argc, char* argv[]) {`