Hi,
I already posted this at Beginners category, but I am seeking more help here.
I am currently trying to perform full fine tuning on the ai-forever/mGPT model (1.3B parameters) using a single A100 GPU (40GB VRAM) on Google Colab. However when running the training is very slow: ~0.06 it/s.
Here is my code:
dataset = load_dataset("allenai/c4", "lt")
train_dataset = dataset["train"]
eval_dataset = dataset["validation"]
train_dataset = train_dataset.take(10000)
eval_dataset = eval_dataset.take(1000)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = train_dataset,
eval_dataset = eval_dataset,
dataset_text_field = "text",
max_seq_length = 2048,
args = TrainingArguments(
gradient_accumulation_steps = 4,
gradient_checkpointing = True,
num_train_epochs = 3,
learning_rate = 2e-4,
per_device_train_batch_size = 4,
per_device_eval_batch_size = 4,
seed = 99,
output_dir = "./checkpoints",
save_strategy = "steps",
eval_strategy = "steps",
save_steps = 0.1,
eval_steps = 0.1,
logging_steps = 0.1,
load_best_model_at_end = True
),
)
trainer_stats = trainer.train()
And the trainer output:
It says it will take ~10hrs to process 10k examples from the c4 dataset.
These are the relevant package versions and a screenshot of GPU usage:
Package Version
---------------------------------- -------------------
accelerate 0.34.2
bitsandbytes 0.44.1
datasets 3.1.0
peft 0.13.2
torch 2.5.0+cu121
trl 0.12.0
It does seem to load the model to the GPU, but for some reason it’s still very slow.
I tried to use keep_in_memory=True when loading the dataset, but it did not help.
I also tried pre-tokenizing the dataset and using Trainer instead of SFTTrainer but the performance was similar.
I was wondering whether this is the expected training speed or is there some issue with my code? And if it is an issue, what could a possible fix be?