Best LLM to pretrain?

victornica · February 29, 2024, 10:58pm

Hi, I am working on a project where I am going to pre-train an LLM on a constrained, non-language domain (thus necessitating pre-training) that there is a lot of data for and then fine-tuning it with DPO based on pairs constructed from a supervised task. My question is—is there a particular LLM architecture that would be best to choose? I plan to have my model have somewhere around 20 million parameters if that makes a difference.

It seems the default model architecture to choose is GPT-2, but given its age I wonder if there are any that are better choices—more efficient to train, more parameter-efficient, “smarter,” etc. I was thinking of choosing an architecture with rotary positional embeddings at least.

Thanks

Topic		Replies	Views
Is there a small (<5GB) dataset for general-purpose LLMs? Beginners	0	384	November 17, 2023
Building Own Knowledge Base LLM Beginners	1	1566	April 6, 2024
Text generation, LLMs and fine-tuning Beginners	0	1693	December 8, 2022
What is the best LLM for finetuning with specific repetetive data? Models	0	124	November 30, 2024
Advice Needed: Best LLM for Structuring Natural Language Product Requirements Beginners	0	431	December 10, 2023

Best LLM to pretrain?

Related topics