Training CodeLlama2 using LORA doesnt save any memory

Im trying to fine-tune a LLaMA model (7B) using Lora. According to what i have researched, i should be able to train this on consumer hardware (using bewteen 15 and 10 gb of VRAM if im not mistaken) but when i try to train the model I am using at least 120 gb of VRAM.
This makes me belive i am not using Lora correctly but i cant seem to find my mistake.
I am following these steps.

  • I’m loading the LLaMA 2 model using quant_config
  • I´m creating peft parameters using LoraConfig. I am also targeting some specific layers here.
  • I´m loading a peft model using the previously created model. I am also overwriting the previous model (i tried loading both models simultaneously). I can see the trainable parameters are 0.24% of the original model, which makes sense.
  • Im creating a training arguments object, and im training the peft model using these.
    Any help is welcomed :slight_smile:

Code:

quant_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_compute_dtype=compute_dtype,
   bnb_4bit_use_double_quant=False,
)

model = AutoModelForCausalLM.from_pretrained(
   base_model,
   quantization_config=quant_config,
   torch_dtype=torch.float16, #esto literal que bajo la mitad lo que pesa el modelo 
   device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

peft_params = LoraConfig(
   lora_alpha=16,
   lora_dropout=0.05,
   r=16,
   bias="none",
   task_type="CAUSAL_LM",
   target_modules = ["q_proj",
   "k_proj",
   "v_proj",
   "o_proj"]
)
model = get_peft_model(model, peft_params)

training_params = TrainingArguments(
   output_dir="./results_llama",
   num_train_epochs=2,
   per_device_train_batch_size=3,
   gradient_accumulation_steps=1,
   optim="adamw_torch",
   eval_steps=20,
   save_strategy="no", #No lo estoy guardando, por ahora. No se por que pero me guarda todo y no solo los archivos de peft.
   #save_steps=25,
   logging_steps=20,
   learning_rate=2e-4,
   weight_decay=0.001,
   fp16=False, #En uno esta true, en el otro false. que onda
   bf16=False,
   max_grad_norm=0.3,
   max_steps=-1,
   warmup_ratio=0.03,
   group_by_length=True,
   lr_scheduler_type="constant",
   report_to="wandb",
   run_name=f"codellama-{datetime.datetime.now().strftime('%Y-%m-%d-%H-%M')}",
)

trainer = SFTTrainer(
   model=model,
   train_dataset=tokenized_train_dataset,
   eval_dataset=tokenized_val_dataset,
   peft_config=peft_params,
   #dataset_text_field="text",
   max_seq_length=None,
   tokenizer=tokenizer,
   args=training_params,
   dataset_text_field="text",
   packing=False
    )

trainer.train()