Small LMs to prototype architecture experiments on

zohaib-khan5040 · January 27, 2025, 5:25pm

Hello!

Thank you so much for the comprehensive reply! I totally forgot about OPT and GPT-Neo; they’re also well cited in the literature so I will definitely be testing on them too. I also found this paper that does a survey of small LMs so there’s some hidden nuggets bound to be in there: https://arxiv.org/pdf/2501.05465

Kinda disappointing the SmolLM code isn’t there. There’s a similar situation with IBM’s Granite models where there’s open weights but the implementations aren’t available online which is really weird.

Anyways I think that’s enough for me to work off.
Again, thank you very much!

Topic		Replies	Views
Resources for model design (number of layers, attention heads, etc) Beginners	2	601	January 4, 2021
Implementing a custom Attention Transformer Awesome paper	5	3176	September 6, 2021
Create a simple and reproducable training process for a GPT-like model? Beginners	1	263	December 27, 2023
Training a language model from scratch with tensorflow (not pytorch)? Intermediate	4	853	August 9, 2021
Fine-Tune GPT-2 Spanish From Example Notebook OOM Beginners	0	668	December 17, 2020

Small LMs to prototype architecture experiments on

Related topics