Small LMs to prototype architecture experiments on

Hello!

Thank you so much for the comprehensive reply! I totally forgot about OPT and GPT-Neo; they’re also well cited in the literature so I will definitely be testing on them too. I also found this paper that does a survey of small LMs so there’s some hidden nuggets bound to be in there: https://arxiv.org/pdf/2501.05465

Kinda disappointing the SmolLM code isn’t there. There’s a similar situation with IBM’s Granite models where there’s open weights but the implementations aren’t available online which is really weird.

Anyways I think that’s enough for me to work off.
Again, thank you very much!

1 Like