Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer

John6666 · November 6, 2024, 10:06am

Multi-GPU, PEFT and quantization are a combination that seems quite likely to cause errors…
It’s probably not a simple problem, so I think it’s better to ask on HF Discord than on the forum.
I didn’t get much useful information from the search either.

github.com/huggingface/trl

SFTTrainer not using both GPUs

opened 02:46AM - 01 Feb 24 UTC

closed 10:42PM - 01 Feb 24 UTC

johnowhitaker

I am trying to fine-tune Llama 2 7B with QLoRA on 2 GPUs. From what I've read SF…TTrainer should support multiple GPUs just fine, but when I run this I see one GPU with high utilization and one with almost none: ![Screenshot 2024-01-31 182908](https://github.com/huggingface/trl/assets/6575163/159f6b97-6f9a-47ee-adbc-ef2064ef2f5c) Expected behaviour would be that both get used during training and it would be about 2x as fast as single-GPU training. I'm running this with `python train.py`, which I think means Trainer uses DP? I get an error launching with `python -m torch.distributed.launch train.py` (`RuntimeError: Expected to mark a variable ready only once...`) which makes me think DDP would need a bit more work... This is an older machine without any fast interconnect, but I saw similar usage on a cloud machine with 2xA5000s so I don't think it's that. Anyway, maybe someone can help by explaining why DP might be so slow in this case and/or how to test DDP instead :) Script: ```python from datasets import load_dataset import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer from peft import LoraConfig from trl import SFTTrainer from transformers import TrainingArguments # Load the dataset dataset_name = "timdettmers/openassistant-guanaco" dataset = load_dataset(dataset_name, split="train") # Load the model + tokenizer model_name = "meta-llama/Llama-2-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, ) model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, trust_remote_code=True, use_cache = False ) # PEFT config lora_alpha = 16 lora_dropout = 0.1 lora_r = 64 peft_config = LoraConfig( lora_alpha=lora_alpha, lora_dropout=lora_dropout, r=lora_r, bias="none", task_type="CAUSAL_LM", target_modules=["k_proj", "q_proj", "v_proj", "up_proj", "down_proj", "gate_proj"] ) # Args max_seq_length = 512 output_dir = "./results" per_device_train_batch_size = 8 gradient_accumulation_steps = 2 optim = "adamw_hf" save_steps = 10 logging_steps = 1 learning_rate = 2e-4 max_grad_norm = 0.3 max_steps = 300 # Approx the size of guanaco at bs 8, ga 2, 2 GPUs. warmup_ratio = 0.1 lr_scheduler_type = "cosine" training_arguments = TrainingArguments( output_dir=output_dir, per_device_train_batch_size=per_device_train_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, optim=optim, save_steps=save_steps, logging_steps=logging_steps, learning_rate=learning_rate, fp16=True, max_grad_norm=max_grad_norm, max_steps=max_steps, warmup_ratio=warmup_ratio, group_by_length=True, lr_scheduler_type=lr_scheduler_type, gradient_checkpointing=True, report_to="wandb", ) # Trainer trainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=peft_config, dataset_text_field="text", max_seq_length=max_seq_length, tokenizer=tokenizer, args=training_arguments, ) # Not sure if needed but noticed this in https://colab.research.google.com/drive/1t3exfAVLQo4oKIopQT1SKxK4UcYg7rC1#scrollTo=7OyIvEx7b1GT for name, module in trainer.model.named_modules(): if "norm" in name: module = module.to(torch.float32) # Train :) trainer.train() ```

Topic		Replies	Views
Issue with LoRA Adapter Loading on Multiple GPUs during Fine-Tuning with Accelerate and SFTTrainer 🤗Accelerate	3	964	September 18, 2024
Using evaluate.evaluator on a PEFT model Beginners	1	244	April 10, 2024
Transformer's trainer runtime error 🤗Transformers	1	83	December 5, 2024
Accelerator() causes Error Beginners	2	368	April 12, 2024
Errors when using gradient accumulation with FSDP + PEFT LoRA + SFTTrainer 🤗Accelerate	2	1033	February 6, 2025

Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer

Related topics