1D Diffusers not behaving as expected, am i retarded?

BrianHall · April 30, 2025, 12:33pm

This is my first Post on the Forum, as I’m usually able to solve my issues on my own. But in this very simple Diffusion training of a trivial 1D dataset, the model just cant seem to figure it out. This leads me to think that I’m fundamentally misunderstanding something in the 1D diffusers implementation. Or perhaps it is not supported how i imagined it to be.

I have trained 2D models without issues for reference, with close to identical code.

The following code is the entire program, which should return a graph at each 50 epochs (or whatever you set it to). I tried to set the epoch to something fairly high, to verify that it is not the training time. The returned graphs range between [400, -400], and does not seem to improve with time training. The training has a loss at 0.7 at the first epoch and then goes down to 0.51 and stays there. Which i believe is also wrong.

code:

import torch
from diffusers import DDPMScheduler, UNet1DModel
from torch.utils.data import Dataset, DataLoader
from matplotlib import pyplot as plt
import numpy as np

class Synthetic1DData(Dataset):
    def __init__(self, num_samples=1000, seq_len=64):  
        self.data = torch.ones(num_samples, 1, seq_len)
        self.data[:, :, 32:] = 0
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx]
model = UNet1DModel(
    sample_size=64,
    in_channels=1,
    out_channels=1,
    layers_per_block=1,
    block_out_channels=(32, 64),
    down_block_types=("DownBlock1D", "DownBlock1D"),
    up_block_types=("UpBlock1D", "UpBlock1D"),
)
noise_scheduler = DDPMScheduler(
    num_train_timesteps=1000,
    clip_sample=False
)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)

dataset = Synthetic1DData(seq_len=64)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

model.train()
for epoch in range(1000):
    total_loss = 0
    num_batches = 0
    for batch in dataloader:
        noise = torch.randn_like(batch)
        
        timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (batch.shape[0],))
        noisy = noise_scheduler.add_noise(batch.to(device), noise.to(device), timesteps.to(device))
        pred = model(noisy.to(device), timesteps.to(device)).sample

        loss = torch.nn.functional.mse_loss(pred.to(device), noise.to(device))
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        num_batches += 1

    #Logging
    if epoch % 50 == 0:
        avg_loss = total_loss / num_batches
        print(f"Epoch {epoch}, Average Loss: {avg_loss:.4f}")

        model.eval()
        with torch.no_grad():
            sample = torch.randn(1, 1, 64).to(device)
            for t in noise_scheduler.timesteps:
                residual = model(sample, t).sample
                sample = noise_scheduler.step(residual, t, sample).prev_sample
            
            plt.figure(figsize=(10, 4))
            plt.plot(np.arange(64), sample[0].cpu().squeeze().numpy())
            plt.title(f'Epoch {epoch}')
            plt.show()
        model.train()

John6666 · April 30, 2025, 9:03pm

I think something wrong in UNet1DModel…

github.com/huggingface/diffusers

UNet1DModel does not converge

opened 04:00PM - 29 Mar 25 UTC

MrInformatic

bug stale

### Describe the bug I tried to train a UNet1DModel, DDPMScheduler Diffusion Pi…peline using AdamW optimizer and mse_loss. No matter what I tried, I never got the model to produce a loss below `0.5`. As a sanity check, I also tried to replace the UNet1DModel with a UNet2DModel, which performed significantly better. Both Pipelines should produce silence or a blank image respectively. It seems like something is wrong with the UNet1DModel since this is the only part which was changed. #3203 Also mentions problems with UNet1DModel, but I tried to train my model with different learning rates using HPO allready. ### Reproduction ```py import torch from diffusers import UNet1DModel, DDPMScheduler, UNet2DModel from diffusers.utils.torch_utils import randn_tensor from torch.nn.functional import mse_loss import matplotlib.pyplot as plt def test_diffusers(dimensions: int): device = torch.device("cpu") sample_size = 32 generator = torch.Generator(device=device) noise_scheduler = DDPMScheduler(num_train_timesteps=1000) if dimensions == 1: model = UNet1DModel( sample_size=sample_size, in_channels=1, out_channels=1, block_out_channels=( 64, ), down_block_types=( "DownBlock1D", ), up_block_types=( "UpBlock1D", ), ).to(device) shape = (16, 1, sample_size) elif dimensions == 2: model = UNet2DModel( sample_size=sample_size, in_channels=1, out_channels=1, block_out_channels=( 64, ), down_block_types=( "DownBlock2D", ), up_block_types=( "UpBlock2D", ), ).to(device) shape = (16, 1, sample_size, sample_size) else: raise Exception("only 1D and 2D are supported") optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4) losses = [] for i in list(range(100)): audio = torch.zeros(shape, device=device) noise = randn_tensor(shape, generator=generator, device=device, dtype=audio.dtype) batch_size = audio.shape[0] time_steps = torch.randint(0, noise_scheduler.num_train_timesteps, (batch_size,), device=device).long() noisy_voice = noise_scheduler.add_noise(audio, noise, time_steps) target = noise pred = model(noisy_voice, time_steps, return_dict=False)[0] loss = mse_loss(pred, target) loss.backward(loss) losses.append(loss.item()) optimizer.step() optimizer.zero_grad() return losses if __name__ == "__main__": losses_1d = test_diffusers(1) losses_2d = test_diffusers(2) plt.plot(losses_1d) plt.plot(losses_2d) plt.legend(["1D", "2D"], loc="upper right") plt.savefig(f"plot.png") ``` ![Image](https://github.com/user-attachments/assets/7c6d93e8-8e8a-4c09-841e-14d190e6f53b) ### Logs ```shell ``` ### System Info diffusers: 0.32.2 torch: 2.6.0 Python: 3.10.14 OS: Manjaro Linux CPU: AMD Ryzen 5 1600X GPU: Nvidia RTX 3090 24GB RAM: 32 GB ### Who can help? _No response_

Topic		Replies	Views
UNet1DModel not converging on single batch 🧨 Diffusers	3	649	June 19, 2023
Model input shape doesnt match Beginners	2	21	April 12, 2025
Model UNet1DModel Beginners	0	627	November 11, 2022
Resume in train_unconditional.py 🧨 Diffusers	4	785	November 11, 2022
Training from scratch 🧨 Diffusers	11	3211	February 20, 2025

1D Diffusers not behaving as expected, am i retarded?

Related topics