Transformer uses xavier.
So using Kaiming init for Embedding matrix is preferred for RNN? In case of transformer Xavier is preferred? Am I correct to say this?
Transformer uses xavier.
So using Kaiming init for Embedding matrix is preferred for RNN? In case of transformer Xavier is preferred? Am I correct to say this?