Skip to content

Docs for LoRA are confusing #3219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oxysoft opened this issue Apr 25, 2023 · 4 comments
Closed

Docs for LoRA are confusing #3219

oxysoft opened this issue Apr 25, 2023 · 4 comments
Labels
stale Issues that haven't received updates

Comments

@oxysoft
Copy link

oxysoft commented Apr 25, 2023

Apologies if API Design is not the right tag,

I was reading the docs for this page https://huggingface.co/docs/diffusers/training/lora and for a number of reasons it seemed very confusing to me as a end-user:

  1. At the top of the page "Currently, LoRA is only supported for the attention layers of the UNet2DConditionalModel." I don't understand what the implications of this are or if I should care
  2. The docs do not mention anything about creating my own dataset, only how to make pokemons...
  3. It does not specify whether or not we need captions for each image. With DreamBooth, each fine-tune was a single concept and hijacked an existing word. I see mentions of BLIP captions so I believe LoRA has this capability of adjusting of nuanced prompt and if so it should be stated on the page.
  4. After the training command before moving onto inference, the docs should let me know around how long it will take so I can plan accordingly, and exactly what will happen which files will be created, etc.

Cheers, this library is definitely the best API for using deep learning models of any kind

@sayakpaul
Copy link
Member

Valid concerns and thanks for being comprehensive about them. It helps us improving the docs tremendously. Cc: @stevhliu @yiyixuxu.

At the top of the page "Currently, LoRA is only supported for the attention layers of the UNet2DConditionalModel." I don't understand what the implications of this are or if I should care

Usually, LoRA fine-tuning of the text encoder along with the UNet leads to better results than LoRA fine-tuning the UNet alone. A reference PR that might be relevant here: #3180.

The docs do not mention anything about creating my own dataset, only how to make pokemons...

The usual datasets used for text-to-image tasks are the ones that have image-caption pairs. Existing datasets include MSCoco, LAION-5B, etc. You can explore some existing datasets here: https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads. Additionally, https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions discusses how the dataset was created in the README. I guess that's helpful?

It does not specify whether or not we need captions for each image. With DreamBooth, each fine-tune was a single concept and hijacked an existing word. I see mentions of BLIP captions so I believe LoRA has this capability of adjusting of nuanced prompt and if so it should be stated on the page.

I think it's clear in the code, though. For each index of the dataset we repeat the instance prompt:

example["instance_prompt_ids"] = self.tokenizer(

As we state here:

Beginner-friendly: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the diffusers library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners.

We strive to make the training examples a readable as possible. So, it's an expectation from the users to read the code a bit. @stevhliu @yiyixuxu what can we do here to improve on this aspect? Perhaps we can briefly include something about in the beginning of the doc so that users are more aware?

After the training command before moving onto inference, the docs should let me know around how long it will take so I can plan accordingly, and exactly what will happen which files will be created, etc.

This is something we can definitely improve on. Cc: @stevhliu @yiyixuxu.

I hope these pointers are helpful.

@stevhliu
Copy link
Member

Thanks for the feedback! 🤗

At the top of the page "Currently, LoRA is only supported for the attention layers of the UNet2DConditionalModel." I don't understand what the implications of this are or if I should care

This is kind of buried at the end of the DreamBooth Training section, so we can move it up to the warning at the top of the page to provide more context.

The docs do not mention anything about creating my own dataset, only how to make pokemons...

We can spin this section about training with your own dataset out and then add links to it from each training doc. I think it'll be more visible this way.

It does not specify whether or not we need captions for each image. With DreamBooth, each fine-tune was a single concept and hijacked an existing word. I see mentions of BLIP captions so I believe LoRA has this capability of adjusting of nuanced prompt and if so it should be stated on the page.

This depends on the task you're working on (unconditional, text-to-image, DreamBooth, etc.). LoRA is a way to make training these tasks faster and more efficient, so the dataset format/task is slightly out of scope since it assumes you're familiar with the task. We can improve the DreamBooth doc here to briefly explain what the instance_prompt is.

After the training command before moving onto inference, the docs should let me know around how long it will take so I can plan accordingly, and exactly what will happen which files will be created, etc.

Great idea! I think we can pull some info from these blog posts.

what can we do here to improve on this aspect? Perhaps we can briefly include something about in the beginning of the doc so that users are more aware?

Maybe we can highlight and draw attention to certain parts of the script that are important similar to this?

@sayakpaul
Copy link
Member

sayakpaul commented Apr 28, 2023

Thanks for your brilliant suggestions, Steven! Would you mind opening a PR to address some of these?

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label May 25, 2023
@github-actions github-actions bot closed this as completed Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

3 participants