I’m looking to experiment with building LLMs for downstream tasks. I’m not interested in inventing new models, but the models must be pre-trained and finetuned with an internal dataset, so I can’t use pre-existing weights or tokenizers. The models must be built from scratch with my data.
Priorities (from highest to lowest):
- Ease of experimentation with different models. I want to be able to rapidly experiment with training different pre-existing models with my dataset without being bogged down needing to rewrite them from scratch, and then debugging implementation issues. Higher level is better.
- Scalability. Right now my datasets is ~100GB in arrow format. I need something that can handle this scale, and ideally more, running efficiently on multiple GPUs without being significantly slower than the alternatives.
- Ease of productionalization. While right now I care more about experimentation than productionalization ideally I’d like something that could be maintained in a work setting where the code isn’t a mess, it’s reliable, and the risk of cryptic non-deterministic errors is minimized.
It seems like when it comes to sharing models and utilizing pre-built models off-the-shelf, Hugging Face is now the standard. However, I can’t tell if that’s true when it comes to easily training models from scratch. What do most of you recommend for this? Hugging Face transformers, pytorch lightning, litgpt, or something else?
For context, I’m using the hugging face dataset modules for saving/loading my data, and the transformers modules for pre-existing model definitions, and I’ve been happy with that. However, I ran into issues trying to run the trainers within my development environment, so I built a simple LLM with pytorch lightning instead. I got reasonable results, but it was a lot of work coding and debugging everything. I’m trying to understand if given I’m not trying to reinvent the wheel, if there’s something better I could be doing. That could just be investing more time trying to train with hugging face transformers if it’s what others recommend.