I followed Train a diffusion model, and it all finished well, yet, the final repo does not look like it should (like anton-l/ddpm-butterflies-128/tree/main
)
It looks like this TukuToi/ddpm-butterflies-128/tree/main
As you can see no unet
, no model_index
, etc
I also checked the folder in the Colaboratory and they do not feature the unet either.
What is wrong? Where is the trained model gone to?
The cell finished with below lines, and indicates to be done in the progress indicator:
Adding files tracked by Git LFS: ['samples/0039.png', 'samples/0049.png']. This may take a bit of time if the files are large.
WARNING:huggingface_hub.repository:Adding files tracked by Git LFS: ['samples/0039.png', 'samples/0049.png']. This may take a bit of time if the files are large.
To https://huggingface.co/TukuToi/ddpm-butterflies-128
2cd00c5..246df76 main -> main
WARNING:huggingface_hub.repository:To https://huggingface.co/TukuToi/ddpm-butterflies-128
2cd00c5..246df76 main -> main
From the logs I gather it never even attempts to save it:
100%
1000/1000 [01:37<00:00, 10.20it/s]
Adding files tracked by Git LFS: ['samples/0000.png']. This may take a bit of time if the files are large.
Upload file samples/0000.png: 100%
526k/526k [00:01<00:00, 491kB/s]
Upload file logs/train_example/events.out.tfevents.1686132774.2b487d1c587c.748.1: 100%
7.71k/7.71k [00:01<?, ?B/s]
To https://huggingface.co/TukuToi/ddpm-butterflies-129
722846e..b4db147 main -> main
Yet in the config, I set the save save_model_epochs
to 1
to be sure it saves it each epoch.
This is really frustrating!