I use
for p in model.parameters():
p.requires_grad = False
to freeze a T5 model (t5-small), but when I print parameters that require grad, there is still one parameter with the size 32121x512
. What is this? Is it the embeddings matrix? Should I freeze it too? It seems backward gradients affect this one remaining parameter.