Skip to content

Refactor OmniGen #10771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 62 commits into from
Feb 12, 2025
Merged

Refactor OmniGen #10771

merged 62 commits into from
Feb 12, 2025

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-w a-r-r-o-w commented Feb 11, 2025

Refactors OmniGen added in #10148.

Image Editing Text-to-Image

I haven't tested all supported usecases extensively, but I believe the changes should be okay since they did not alter any existing logic (except for RoPE).

The summary of changes are:

  • Instead of the custom apply_rotary_emb function, we use the Diffusers one
  • Removed set_attn_processors and related functions. We're thinking of a better way of supporting this since it makes the modeling implementation bloated. For any new models, let's not add these methods. It will be handled better with an upcoming refactor before this release hopefully.
  • Makes sure guidance and scheduler step is performed in float32, so that we don't perform unnecessary bfloat16 -> float32 -> bfloat16 dtype conversion at each step. Instead, latents are always in float32 and casted just before feeding into transformer to required dtype.
  • Removes everything related to attention kwargs. This will be added again by @sayakpaul, or someone else, who will take up adding peft support.
  • Marks non-forward methods in the transformer private, since it could be subject to change in more future refactors
  • Cleans up docs a bit and removes any intermediate layer docs. The parameters are self-explanatory, or already documented in the OmniGenTransformer2DModel docstring
  • Enables layerwise upcasting test

@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu February 11, 2025 20:49
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! thank you!

callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
callback_on_step_end_tensor_inputs: List[str] = ["latents"],
max_sequence_length: int = 120000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is not used, but let's confirm with the author if it's intended to be used or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is simply copied from other pipelines, but is not used in OmniGen.

@yiyixuxu
Copy link
Collaborator

@staoxiao can you take a look to see if the changes are ok?

@sayakpaul
Copy link
Member

Thanks for the heads-up, Aryan! Appreciate it.

I will wait for this before to get merged before starting the LoRA support.

@staoxiao
Copy link
Contributor

@a-r-r-o-w @yiyixuxu , thanks for refactoring. I think these changes are ok.

@a-r-r-o-w
Copy link
Member Author

Thanks for the review @staoxiao! Will merge once CI is green

@a-r-r-o-w a-r-r-o-w merged commit 57ac673 into main Feb 12, 2025
15 checks passed
@a-r-r-o-w a-r-r-o-w deleted the refactor/omnigen branch February 12, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants