SFTTrainer training very slow on GPU. Is this training speed expected?


I already posted this at Beginners category, but I am seeking more help here.

I am currently trying to perform full fine tuning on the ai-forever/mGPT model (1.3B parameters) using a single A100 GPU (40GB VRAM) on Google Colab. However when running the training is very slow: ~0.06 it/s.

Here is my code:

dataset = load_dataset("allenai/c4", "lt")

train_dataset = dataset["train"]
eval_dataset = dataset["validation"]

train_dataset = train_dataset.take(10000)
eval_dataset = eval_dataset.take(1000)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = eval_dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,

    args = TrainingArguments(
        gradient_accumulation_steps = 4,
        gradient_checkpointing = True,

        num_train_epochs = 3,
        learning_rate = 2e-4,
        per_device_train_batch_size = 4,
        per_device_eval_batch_size = 4,

        seed = 99,
        output_dir = "./checkpoints",

        save_strategy = "steps",
        eval_strategy = "steps",

        save_steps = 0.1,
        eval_steps = 0.1,
        logging_steps = 0.1,
        load_best_model_at_end = True

trainer_stats = trainer.train()

And the trainer output:

It says it will take ~10hrs to process 10k examples from the c4 dataset.

These are the relevant package versions and a screenshot of GPU usage:

Package                            Version
---------------------------------- -------------------
accelerate                         0.34.2
bitsandbytes                       0.44.1
datasets                           3.1.0
peft                               0.13.2
torch                              2.5.0+cu121
trl                                0.12.0

It does seem to load the model to the GPU, but for some reason it’s still very slow.

I tried to use keep_in_memory=True when loading the dataset, but it did not help.

I also tried pre-tokenizing the dataset and using Trainer instead of SFTTrainer but the performance was similar.

I was wondering whether this is the expected training speed or is there some issue with my code? And if it is an issue, what could a possible fix be?


There are several similar cases posted on the forum where training is slow even though the GPU is being used.
I have no experience using Trainer, so I can’t say whether this is a bug or a feature, but I suspect it’s probably a bug in some library or a very difficult-to-understand setting error, because if it was this slow, everyone would be having problems.

As a fellow beginner, my best guess is you probably didn’t specify

model = model.to(“cuda”)

or something similar, I don’t see why it would be using the GPU so heavily without doing this though.

1 Like

I provide an answer here: SFTTrainer training very slow. Is this training speed expected? - #4 by domce20

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.