Hello, I’m experimenting with different versions of stable diffusion from scratch.
Currently I’m training a version with clip huge (Based on the imagen paper, a bigger text encoder should improve the final results)
Next I will try to train a version with a bigger unet, my question is, what parameters should I increase in the unet?
I’m thinking about doubling the “layers_per_block”, but maybe changing the “block_out_channels” could also be beneficial, so … what parameters should I change in this bigger version of unet?
1 Like