Hello,
I have a question regarding the intermediate_size
parameter in the RecurrentGemmaConfig. I noticed in the source code that intermediate_size
is divided by 2.
Here’s the snippet of the code:
I observed that intermediate_size
seems to correspond to mlp_expanded_width
in RecurrentGemma’s official implementation. In the official 2B model, the model width is 2560 and the MLP expansion factor is 3, resulting in an mlp_expanded_width
of 7680. However, in the Hugging Face implementation, the default intermediate_size
is 15360.
Given that dividing by 2 yields 7680, which aligns with the official implementation, I’m curious why the division is necessary. Would it not be simpler and clearer to set intermediate_size
directly to 7680, thus avoiding the division?
Thank you for your time and assistance.