-
Notifications
You must be signed in to change notification settings - Fork 5.9k
feat: add load lora weights implementation for 'lora_' prefix LoRA we… #3294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ights format to LoraLoaderMixin
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@showxu will this work inside prompts? also how does Lora differ from TIs? |
It will work with prompt but not in a1111's format which used in sd-webui, this pr added a way to load LoRA weights with 'lora' prefix format, and Text Inversion should used directly by pipe.load_textual_inversion in pipeline AFAIK |
I don't think this is a format that you get from diffusers. Could you shed more details on how did you obtain the LoRA weights? |
It‘s the LoRA weights trained from a1111 sd-webui and most commonly distributed at civitai, which discussed in #3064 (comment), #3064 (comment) is also helpful |
thank you for your answer, I understand what you are saying, but I am deeply confused by the fact that a1111 includes loras in the prompt. I am wondering about the mechanisms behind things because I've seen many use loras as if they were TIs inside the prompt. Could you perhaps shed some light on that whole thing? |
Would this work with the controlnet pipelines like StableDiffusionControlNetInpaintImg2ImgPipeline? |
I am not an expert in the a1111 codebase, but looking at this https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/5ab7f213bec2f816f9c5644becb32eb72c8ffb89/extensions-builtin/Lora it appears that specifying LoRAs in the prompt is just a convenient way to load the LoRAs and then just call the .loadLora functions to replace the weights. I wonder if this is how it could be done here too? The prompt would need to be parsed for the LoRAs in a special syntax, and then a |
@BEpresent I doubt that's all because there's something called latent couples which apply multiple masks and prompts at the same time and you can apply Loras in some masks and not others and it works |
I see, do you mean multiple loras at the same time ? Do you have an example where this is being used in the prompt ? I could imagine in the short term some custom solutions would need to be written to have something like this in diffusers. |
@BEpresent I am actually working on latent couples as we speak, here's what I am using as starting points: |
thanks will have a look at this - naively I would have just loaded a lora when a parser finds it in the prompt. There might be more to it - edit: I remember that vid yeah - this seems to be one level up in terms of complexity, having different masked regions in an image. For the moment I would be happy to load a single Lora or multiple Loras from the prompt, which should be doable just by inserting them into the pipeline. This particular extension could be ported too of course. |
Sorry for the delay here. Will get back to this next week, for sure. Also, seeking reviews from @patrickvonplaten and @pdoane. Meanwhile, if you could provide additional details on how you obtained the LoRA weights and the weights themselves, and a code snippet to test the results, that would be genuinely helpful. |
@BEpresent I think I found how they load the loras in A111: https://github.com/opparco/stable-diffusion-webui-composable-lora edit: this one is even better: https://github.com/a2569875/stable-diffusion-webui-composable-lora |
@alexblattner - we've answered the prompt question elsewhere. Unlike textual inversions, LoRAs are not related to the prompt at all - #3064. Compel provides parsing support - damian0815/compel#30. It is purely a UI/design decision that A1111 uses the prompt.
@BEpresent - I don't think that is a good direction for diffusers library. Complex prompt weighting is already handled externally (e.g. with Compel) and LoRAs/textual inversions management is another layer higher than that. |
@sayakpaul - I haven't tried the code here yet but it looks like a reasonable start from a functionality perspective. There is a design issue to sort out though. This PR directly modifies the weights of the layers rather than creating attention processors like the existing LoRA implementation. The attention processor framework provides a way to change what is currently active, but also has increased memory usage and slightly slower performance with the forwarding overhead. Directly modifying the weight as in this PR is simpler, has no extra memory or performance penalty, but currently there is no way to undo it. Continuing with the strategy in the PR, there could be an optional dictionary as a parameter to save the original layer weights and another function to reapply. Another option would be to update the PR to create attention processors and follow existing design. Either way, I think the LoRA implementations should align on one approach eventually. |
With all due respect @pdoane, it seems to me like there's a knowledge gap regarding Loras. It seems like there's a way to apply Loras weights to specific areas of the image and not other areas (which is not currently possible in diffusers). As I pointed out here:
It seems to me like there's a place for Loras in prompts as they can be key to more complex and interesting implementations. By being too strict you may be dismissing a larger picture. |
Attention processors are a better fit for more dynamic use of LoRAs and puts things on a path towards extensions like that. It's a big refactor on the PR though. |
I see, I would also be fine to exclude any Loras from prompts and for those who want to use a1111 prompts with loras in diffusers (I'm one of those) would need to come up with a conversion script to e.g. parse them manually and then use the diffusers library to load the lora weights with some kind of For textual inversion I would use |
Sorry a bit lost here in the context - is this needed for A1111? |
@patrickvonplaten - I did not open the PR but will try my best to summarize. This PR adds support for LoRAs often found in community usage (e.g. CivitAI). I suspect this is A1111 format but hard for me to say as none of these formats are well documented. The implementation is based on work done on an issue thread that is functional, but does not match the rest of the design of the diffusers library. It directly modifies the layer weights rather than creating attention processors. Most of the discussion is about the API and how LoRAs should be exposed. Tools like InvokeAI and A1111 support LoRAs and their parameters in the prompt so some users/developers will expect that even if it's an implementation detail. FYI - Compel provides parsing support for the scope of this PR. The composable LoRA extension discussed allows a LoRA to operate on a subset of a prompt, but that would impact an attention processor implementation and not the prompt embeds. My recommendations and thoughts:
|
Thanks @pdoane! Ok, I would suggest we first tackle loading LoRA in A1111 format to |
Agreed with Patrick here. @showxu could we maybe try to amend this PR to follow this design instead?
I think this part should be separated from this PR. Also, I am not too sure if it's something we want to allow from
Not yet, I think. Let's discuss that in a separate issue. |
Also, this is a very important consideration:
When directly operating on a pipeline where users can load a particular LoRA which, under the hood, initializes a LoRA attention processor for the UNet (which is what is done when diffusers/src/diffusers/loaders.py Lines 233 to 291 in 0ffac97
But users also have the flexibility to quickly set the attention processor to something else by doing: from diffusers.models.attention_processor import AttnProcessor
pipeline.unet.set_attn_processor(AttnProcessor()) Modifying weights on the fly (the way it's done in this PR), removes this flexibility. |
As a point of comparison, A1111 has switched to directly modifying weights to improve performance: |
Related to: #3064 |
Just ignore this comment; I found the error in the PR that fixed my CUDA issue.Just curious. I'm trying this PR out with version 0.16.1. Just merged the changes with it. I noticed that there is extra code floating in main, so I decided to not use main currently. So the issue I have, may or may not actually be an issue. But I'm just sharing my observation. Here is my loaders.py I've merged this PR with. Here is the code I was running that caused the issue.
Found addmm_impl_cpu_ in Pytorch. Not sure if it helps, thought I'd link it. Not sure if this matters but my system is Win 10; I'm using CUDA, with an AMD 5950x & RTX 4090. |
How does one undo/unload an
|
I have a strong hunch that this approach misses some of the keys in the models. The diffusers format uses slightly different key names for the same things, and the ckpt-to-diffusers script does this mapping. But this one doesn't do any mapping, so I suspect it's missing a lot of keys. I think supporting A1111 format will require looking at the ckpt-to-diffusers script, and ensuring that all the keys are accounted for. It's definitely possible that I'm wrong, but it's worth confirming that all the keys are mapped correctly. PS: I agree that we should just convert the state dict to diffusers format, and then load it via the usual function. No changes will be needed to the LoRA mixin. This conversion function could be in |
I tested this branch and I'm having issues with It works with float32 but of course that's very slow. File "/root/app/models.py", line 203, in loadModel |
We recently added limited support for A1111 LoRAs, so this PR is preceded by that. Cc: @takuma104. It might be worth closing this PR and discuss other options maybe. |
Think it'd be ok to close this PR here |
While the PR is closed, this method still worked best for my use cases with the modified bit of code I provided. The current implementation doesn't load lycoris models correctly and I keep having this error: I have to move my pipeline.to() command in order to get rid of the cuda:0 and cpu error; however, no lora is ever applied. |
As stated in the earlier comments, we cannot go with this implementation because it directly merges the LoRA params into the UNet params. This is something we want to avoid. |
add load lora weights implementation for 'lora_' prefix LoRA weights format to
LoraLoaderMixin
, this should fix #3064, use case: