SFTTrainer training very slow on GPU. Is this training speed expected?

domce20 · November 11, 2024, 9:49pm

Hi,

I already posted this at Beginners category, but I am seeking more help here.

I am currently trying to perform full fine tuning on the ai-forever/mGPT model (1.3B parameters) using a single A100 GPU (40GB VRAM) on Google Colab. However when running the training is very slow: ~0.06 it/s.

Here is my code:

dataset = load_dataset("allenai/c4", "lt")

train_dataset = dataset["train"]
eval_dataset = dataset["validation"]

train_dataset = train_dataset.take(10000)
eval_dataset = eval_dataset.take(1000)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = eval_dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,

    args = TrainingArguments(
        gradient_accumulation_steps = 4,
        gradient_checkpointing = True,

        num_train_epochs = 3,
        learning_rate = 2e-4,
        per_device_train_batch_size = 4,
        per_device_eval_batch_size = 4,

        seed = 99,
        output_dir = "./checkpoints",

        save_strategy = "steps",
        eval_strategy = "steps",

        save_steps = 0.1,
        eval_steps = 0.1,
        logging_steps = 0.1,
        load_best_model_at_end = True
    ),
)

trainer_stats = trainer.train()

And the trainer output:

It says it will take ~10hrs to process 10k examples from the c4 dataset.

These are the relevant package versions and a screenshot of GPU usage:

Package                            Version
---------------------------------- -------------------
accelerate                         0.34.2
bitsandbytes                       0.44.1
datasets                           3.1.0
peft                               0.13.2
torch                              2.5.0+cu121
trl                                0.12.0

It does seem to load the model to the GPU, but for some reason it’s still very slow.

I tried to use keep_in_memory=True when loading the dataset, but it did not help.

I also tried pre-tokenizing the dataset and using Trainer instead of SFTTrainer but the performance was similar.

I was wondering whether this is the expected training speed or is there some issue with my code? And if it is an issue, what could a possible fix be?

John6666 · November 12, 2024, 1:22am

There are several similar cases posted on the forum where training is slow even though the GPU is being used.
I have no experience using Trainer, so I can’t say whether this is a bug or a feature, but I suspect it’s probably a bug in some library or a very difficult-to-understand setting error, because if it was this slow, everyone would be having problems.

epage480 · November 12, 2024, 8:25pm

As a fellow beginner, my best guess is you probably didn’t specify

model = model.to(“cuda”)

or something similar, I don’t see why it would be using the GPU so heavily without doing this though.

domce20 · February 8, 2025, 6:52pm

I provide an answer here: SFTTrainer training very slow. Is this training speed expected? - #4 by domce20

system · February 9, 2025, 6:52am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SFTTrainer training very slow. Is this training speed expected? Beginners	4	259	February 8, 2025
How is the number of steps calculated in trl's SFTTrainer under multiple-GPU? 🤗Transformers	2	2842	December 5, 2023
Reproduce SFTTrainer with Accelerate and Pytorch 🤗Accelerate	0	43	May 18, 2025
SFTTrainer too slow during the build (or ingestion) phase 🤗Transformers	0	94	November 27, 2024
Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer Models	3	1582	November 7, 2024

SFTTrainer training very slow on GPU. Is this training speed expected?

Related topics