I have been trying to train adapters from scratch. I am using my custom diffusion implementation but followed the adapter implementation from Diffusers. Passing the residual features to the UNET and passing the adapter parameters to the optimizer and everything else is frozen except the adapter.
I was wondering if the above general idea is correct, how much training is needed before seeing some changes? Currently, I have been running my model for 15k steps and no effect. Also, does the adapter effect appear gradually or not? Finally any guidance or experience in training adapters to share?