Training from scratch: HF transformers vs pytorch lightning vs others. What is used in practice?

vishnumodi1 · September 17, 2025, 7:04pm

I’m looking to experiment with building LLMs for downstream tasks. I’m not interested in inventing new models, but the models must be pre-trained and finetuned with an internal dataset, so I can’t use pre-existing weights or tokenizers. The models must be built from scratch with my data.

Priorities (from highest to lowest):

Ease of experimentation with different models. I want to be able to rapidly experiment with training different pre-existing models with my dataset without being bogged down needing to rewrite them from scratch, and then debugging implementation issues. Higher level is better.
Scalability. Right now my datasets is ~100GB in arrow format. I need something that can handle this scale, and ideally more, running efficiently on multiple GPUs without being significantly slower than the alternatives.
Ease of productionalization. While right now I care more about experimentation than productionalization ideally I’d like something that could be maintained in a work setting where the code isn’t a mess, it’s reliable, and the risk of cryptic non-deterministic errors is minimized.

It seems like when it comes to sharing models and utilizing pre-built models off-the-shelf, Hugging Face is now the standard. However, I can’t tell if that’s true when it comes to easily training models from scratch. What do most of you recommend for this? Hugging Face transformers, pytorch lightning, litgpt, or something else?

For context, I’m using the hugging face dataset modules for saving/loading my data, and the transformers modules for pre-existing model definitions, and I’ve been happy with that. However, I ran into issues trying to run the trainers within my development environment, so I built a simple LLM with pytorch lightning instead. I got reasonable results, but it was a lot of work coding and debugging everything. I’m trying to understand if given I’m not trying to reinvent the wheel, if there’s something better I could be doing. That could just be investing more time trying to train with hugging face transformers if it’s what others recommend.

John6666 · September 17, 2025, 9:58pm

Using Transformers is recommended. They wrap common pitfalls and inconveniences for you. Even if not the Hugging Face Transformers implementation itself, you can find several third-party optimized versions if you look.

Since Transformers also act as wrappers for PyTorch and several other backends, models you create can be used directly as PyTorch models, making reusability issues less likely to arise.

Topic		Replies	Views
Training General Pytorch model with HuggingFace's Trainer 🤗Transformers	0	409	May 7, 2023
Train a transformer from scratch 🤗Transformers	0	445	August 9, 2021
Finetuing GPT model? 🤗Transformers	2	367	August 29, 2021
Saving underlying language model after trained on downstream task 🤗Transformers	0	426	September 14, 2020
Trainer class optimization for transformer models Models	0	426	January 8, 2022

Training from scratch: HF transformers vs pytorch lightning vs others. What is used in practice?

Related topics