conditional Unet1D #3044

lucala · 2023-04-10T20:18:04Z

Is your feature request related to a problem? Please describe.
I would like to work on text conditioned diffusion of 1d signals. There exists a variety of building blocks for conditional Unet2D but not for Unet1D.

Describe the solution you'd like
Support for conditional Unet1D (unet_1d_condition.py and respective conditional building blocks)

Describe alternatives you've considered
Other repo's, but they do not offer the amazing ecosystem Huggingface has.

sayakpaul · 2023-04-11T07:45:17Z

Hi @lucala! Thanks for being willing to do this :)

Could you help us understand the impact of this feature e.g., some works that could benefit deeply from this?

shivammehta25 · 2023-04-12T06:54:12Z

Hi, I would like this feature as well, even I would also like to help out to put it into the library, my use case is speech synthesis, it would be helpful to synthesise audio representations conditioned on text input.

sayakpaul · 2023-04-12T07:01:20Z

We have an AudioDiffusionPipeline that leverages the UNet2DConditionModel. Could that be repurposed for your use-case? I am no expert in the domain, though. So, tagging @sanchit-gandhi.

shivammehta25 · 2023-04-12T07:40:00Z

Treating the mel spectrogram as images might not be the best inductive bias that we would want in audio synthesis there is no temporal relation in the frequency domain i.e. on the y-axis, so just for experimentation purposes, it would be nice to try it with a 1D model.

lucala · 2023-04-12T07:44:41Z

Yes, AudioDiffusionPipeline works on 2D spectrograms, which can also work but in my case I'm trying to experiment on raw 1D audio signals. This approach has been shown to work in Dance Diffusion but it only exists for the unconditional case.

sanchit-gandhi · 2023-04-21T14:55:09Z

Cool discussion, sorry to not have replied earlier! I definitely agree that working directly in the audio domain is better than log-mel space (e.g. NaturalSpeech2 > DiffGAN-TTS)

I think if there's a new pipeline/model that we want to add to diffusers that leverages a 1-d conditional UNet model it'd make sense to add the class. I'm not sure how useful a standalone 1-d conditional UNet model would be though?

shivammehta25 · 2023-04-21T14:59:48Z

I feel it will be useful for research purposes, having stronger conditioning by cross-attention would definitely perform better than concatenating channels initially (because that is what I am doing right now).

leng-yue · 2023-04-22T07:49:44Z

One potential application of audio is utilizing the UNet1D conditional model to forecast pitches based on midi notes and phones. This approach can prove beneficial in developing pitch prediction models.

sanchit-gandhi · 2023-04-27T17:05:47Z

I'll leave this to @sayakpaul to comment on whether adding individual components for research purposes is part of the current design philosophy in Diffusers. From what I gather, the library is typically developed the other way round, where new research is done in standalone repositories and then merged into Diffusers when released.

sayakpaul · 2023-04-28T03:55:01Z

I think for now, we will just keep the UNet1D model as is and later we can revisit any modification if there's any model/pipeline that would benefit from it. This is in line with what we have been doing for some of our recent stuff such as ControlNet, T2I adapter (being worked on here: #2555).

@williamberman @patrickvonplaten WDYT?

patrickvonplaten · 2023-04-28T10:20:54Z

I don't think there is a need for the conditional UNet1D

github-actions · 2023-05-22T15:03:16Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

jxmorris12 · 2025-01-06T20:28:21Z

If anyone has a conditional 1D implementation, please share it here!

andleb · 2025-01-06T22:50:04Z

If anyone has a conditional 1D implementation, please share it here!

I built one on top of lucidrains' Karras ("elucidated") diffusion implementation, it's in a bit of a "research code" stage right now, though.

github-actions bot added the stale Issues that haven't received updates label May 22, 2023

github-actions bot closed this as completed May 30, 2023

zaccharieramzi mentioned this issue Feb 2, 2024

Exposing models elementary blocks to design new models #6824

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conditional Unet1D #3044

conditional Unet1D #3044

lucala commented Apr 10, 2023

sayakpaul commented Apr 11, 2023

shivammehta25 commented Apr 12, 2023

sayakpaul commented Apr 12, 2023

shivammehta25 commented Apr 12, 2023

lucala commented Apr 12, 2023

sanchit-gandhi commented Apr 21, 2023

shivammehta25 commented Apr 21, 2023

leng-yue commented Apr 22, 2023

sanchit-gandhi commented Apr 27, 2023

sayakpaul commented Apr 28, 2023 •

edited

Loading

patrickvonplaten commented Apr 28, 2023

github-actions bot commented May 22, 2023

jxmorris12 commented Jan 6, 2025

andleb commented Jan 6, 2025

conditional Unet1D #3044

conditional Unet1D #3044

Comments

lucala commented Apr 10, 2023

sayakpaul commented Apr 11, 2023

shivammehta25 commented Apr 12, 2023

sayakpaul commented Apr 12, 2023

shivammehta25 commented Apr 12, 2023

lucala commented Apr 12, 2023

sanchit-gandhi commented Apr 21, 2023

shivammehta25 commented Apr 21, 2023

leng-yue commented Apr 22, 2023

sanchit-gandhi commented Apr 27, 2023

sayakpaul commented Apr 28, 2023 • edited Loading

patrickvonplaten commented Apr 28, 2023

github-actions bot commented May 22, 2023

jxmorris12 commented Jan 6, 2025

andleb commented Jan 6, 2025

sayakpaul commented Apr 28, 2023 •

edited

Loading