Docs for LoRA are confusing #3219

oxysoft · 2023-04-25T02:09:39Z

Apologies if API Design is not the right tag,

I was reading the docs for this page https://huggingface.co/docs/diffusers/training/lora and for a number of reasons it seemed very confusing to me as a end-user:

At the top of the page "Currently, LoRA is only supported for the attention layers of the UNet2DConditionalModel." I don't understand what the implications of this are or if I should care
The docs do not mention anything about creating my own dataset, only how to make pokemons...
It does not specify whether or not we need captions for each image. With DreamBooth, each fine-tune was a single concept and hijacked an existing word. I see mentions of BLIP captions so I believe LoRA has this capability of adjusting of nuanced prompt and if so it should be stated on the page.
After the training command before moving onto inference, the docs should let me know around how long it will take so I can plan accordingly, and exactly what will happen which files will be created, etc.

Cheers, this library is definitely the best API for using deep learning models of any kind

sayakpaul · 2023-04-25T05:23:05Z

Valid concerns and thanks for being comprehensive about them. It helps us improving the docs tremendously. Cc: @stevhliu @yiyixuxu.

At the top of the page "Currently, LoRA is only supported for the attention layers of the UNet2DConditionalModel." I don't understand what the implications of this are or if I should care

Usually, LoRA fine-tuning of the text encoder along with the UNet leads to better results than LoRA fine-tuning the UNet alone. A reference PR that might be relevant here: #3180.

The docs do not mention anything about creating my own dataset, only how to make pokemons...

The usual datasets used for text-to-image tasks are the ones that have image-caption pairs. Existing datasets include MSCoco, LAION-5B, etc. You can explore some existing datasets here: https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads. Additionally, https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions discusses how the dataset was created in the README. I guess that's helpful?

It does not specify whether or not we need captions for each image. With DreamBooth, each fine-tune was a single concept and hijacked an existing word. I see mentions of BLIP captions so I believe LoRA has this capability of adjusting of nuanced prompt and if so it should be stated on the page.

I think it's clear in the code, though. For each index of the dataset we repeat the instance prompt:

diffusers/examples/dreambooth/train_dreambooth_lora.py

Line 476 in c5933c9

example["instance_prompt_ids"] = self.tokenizer(

As we state here:

Beginner-friendly: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the diffusers library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners.

We strive to make the training examples a readable as possible. So, it's an expectation from the users to read the code a bit. @stevhliu @yiyixuxu what can we do here to improve on this aspect? Perhaps we can briefly include something about in the beginning of the doc so that users are more aware?

After the training command before moving onto inference, the docs should let me know around how long it will take so I can plan accordingly, and exactly what will happen which files will be created, etc.

This is something we can definitely improve on. Cc: @stevhliu @yiyixuxu.

I hope these pointers are helpful.

stevhliu · 2023-04-27T23:35:26Z

Thanks for the feedback! 🤗

At the top of the page "Currently, LoRA is only supported for the attention layers of the UNet2DConditionalModel." I don't understand what the implications of this are or if I should care

This is kind of buried at the end of the DreamBooth Training section, so we can move it up to the warning at the top of the page to provide more context.

The docs do not mention anything about creating my own dataset, only how to make pokemons...

We can spin this section about training with your own dataset out and then add links to it from each training doc. I think it'll be more visible this way.

It does not specify whether or not we need captions for each image. With DreamBooth, each fine-tune was a single concept and hijacked an existing word. I see mentions of BLIP captions so I believe LoRA has this capability of adjusting of nuanced prompt and if so it should be stated on the page.

This depends on the task you're working on (unconditional, text-to-image, DreamBooth, etc.). LoRA is a way to make training these tasks faster and more efficient, so the dataset format/task is slightly out of scope since it assumes you're familiar with the task. We can improve the DreamBooth doc here to briefly explain what the instance_prompt is.

After the training command before moving onto inference, the docs should let me know around how long it will take so I can plan accordingly, and exactly what will happen which files will be created, etc.

Great idea! I think we can pull some info from these blog posts.

what can we do here to improve on this aspect? Perhaps we can briefly include something about in the beginning of the doc so that users are more aware?

Maybe we can highlight and draw attention to certain parts of the script that are important similar to this?

sayakpaul · 2023-04-28T06:06:18Z

Thanks for your brilliant suggestions, Steven! Would you mind opening a PR to address some of these?

github-actions · 2023-05-25T15:02:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

stevhliu mentioned this issue May 1, 2023

[docs] Improve LoRA docs #3311

Merged

github-actions bot added the stale Issues that haven't received updates label May 25, 2023

github-actions bot closed this as completed Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs for LoRA are confusing #3219

Docs for LoRA are confusing #3219

oxysoft commented Apr 25, 2023

sayakpaul commented Apr 25, 2023

stevhliu commented Apr 27, 2023

sayakpaul commented Apr 28, 2023 •

edited

Loading

github-actions bot commented May 25, 2023

Docs for LoRA are confusing #3219

Docs for LoRA are confusing #3219

Comments

oxysoft commented Apr 25, 2023

sayakpaul commented Apr 25, 2023

stevhliu commented Apr 27, 2023

sayakpaul commented Apr 28, 2023 • edited Loading

github-actions bot commented May 25, 2023

sayakpaul commented Apr 28, 2023 •

edited

Loading