Skip to content

conditional Unet1D #3044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lucala opened this issue Apr 10, 2023 · 14 comments
Closed

conditional Unet1D #3044

lucala opened this issue Apr 10, 2023 · 14 comments
Labels
stale Issues that haven't received updates

Comments

@lucala
Copy link

lucala commented Apr 10, 2023

Is your feature request related to a problem? Please describe.
I would like to work on text conditioned diffusion of 1d signals. There exists a variety of building blocks for conditional Unet2D but not for Unet1D.

Describe the solution you'd like
Support for conditional Unet1D (unet_1d_condition.py and respective conditional building blocks)

Describe alternatives you've considered
Other repo's, but they do not offer the amazing ecosystem Huggingface has.

@sayakpaul
Copy link
Member

Hi @lucala! Thanks for being willing to do this :)

Could you help us understand the impact of this feature e.g., some works that could benefit deeply from this?

@shivammehta25
Copy link

Hi, I would like this feature as well, even I would also like to help out to put it into the library, my use case is speech synthesis, it would be helpful to synthesise audio representations conditioned on text input.

@sayakpaul
Copy link
Member

We have an AudioDiffusionPipeline that leverages the UNet2DConditionModel. Could that be repurposed for your use-case? I am no expert in the domain, though. So, tagging @sanchit-gandhi.

@shivammehta25
Copy link

Treating the mel spectrogram as images might not be the best inductive bias that we would want in audio synthesis there is no temporal relation in the frequency domain i.e. on the y-axis, so just for experimentation purposes, it would be nice to try it with a 1D model.

@lucala
Copy link
Author

lucala commented Apr 12, 2023

Yes, AudioDiffusionPipeline works on 2D spectrograms, which can also work but in my case I'm trying to experiment on raw 1D audio signals. This approach has been shown to work in Dance Diffusion but it only exists for the unconditional case.

@sanchit-gandhi
Copy link
Contributor

Cool discussion, sorry to not have replied earlier! I definitely agree that working directly in the audio domain is better than log-mel space (e.g. NaturalSpeech2 > DiffGAN-TTS)

I think if there's a new pipeline/model that we want to add to diffusers that leverages a 1-d conditional UNet model it'd make sense to add the class. I'm not sure how useful a standalone 1-d conditional UNet model would be though?

@shivammehta25
Copy link

I feel it will be useful for research purposes, having stronger conditioning by cross-attention would definitely perform better than concatenating channels initially (because that is what I am doing right now).

@leng-yue
Copy link
Contributor

One potential application of audio is utilizing the UNet1D conditional model to forecast pitches based on midi notes and phones. This approach can prove beneficial in developing pitch prediction models.

@sanchit-gandhi
Copy link
Contributor

I'll leave this to @sayakpaul to comment on whether adding individual components for research purposes is part of the current design philosophy in Diffusers. From what I gather, the library is typically developed the other way round, where new research is done in standalone repositories and then merged into Diffusers when released.

@sayakpaul
Copy link
Member

sayakpaul commented Apr 28, 2023

I think for now, we will just keep the UNet1D model as is and later we can revisit any modification if there's any model/pipeline that would benefit from it. This is in line with what we have been doing for some of our recent stuff such as ControlNet, T2I adapter (being worked on here: #2555).

@williamberman @patrickvonplaten WDYT?

@patrickvonplaten
Copy link
Contributor

I don't think there is a need for the conditional UNet1D

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label May 22, 2023
@jxmorris12
Copy link

If anyone has a conditional 1D implementation, please share it here!

@andleb
Copy link

andleb commented Jan 6, 2025

If anyone has a conditional 1D implementation, please share it here!

I built one on top of lucidrains' Karras ("elucidated") diffusion implementation, it's in a bit of a "research code" stage right now, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

8 participants