Im trying to fine-tune a LLaMA model (7B) using Lora. According to what i have researched, i should be able to train this on consumer hardware (using bewteen 15 and 10 gb of VRAM if im not mistaken) but when i try to train the model I am using at least 120 gb of VRAM.
This makes me belive i am not using Lora correctly but i cant seem to find my mistake.
I am following these steps.
- I’m loading the LLaMA 2 model using quant_config
- I´m creating peft parameters using LoraConfig. I am also targeting some specific layers here.
- I´m loading a peft model using the previously created model. I am also overwriting the previous model (i tried loading both models simultaneously). I can see the trainable parameters are 0.24% of the original model, which makes sense.
- Im creating a training arguments object, and im training the peft model using these.
Any help is welcomed
Code:
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=False,
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,
torch_dtype=torch.float16, #esto literal que bajo la mitad lo que pesa el modelo
device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1
peft_params = LoraConfig(
lora_alpha=16,
lora_dropout=0.05,
r=16,
bias="none",
task_type="CAUSAL_LM",
target_modules = ["q_proj",
"k_proj",
"v_proj",
"o_proj"]
)
model = get_peft_model(model, peft_params)
training_params = TrainingArguments(
output_dir="./results_llama",
num_train_epochs=2,
per_device_train_batch_size=3,
gradient_accumulation_steps=1,
optim="adamw_torch",
eval_steps=20,
save_strategy="no", #No lo estoy guardando, por ahora. No se por que pero me guarda todo y no solo los archivos de peft.
#save_steps=25,
logging_steps=20,
learning_rate=2e-4,
weight_decay=0.001,
fp16=False, #En uno esta true, en el otro false. que onda
bf16=False,
max_grad_norm=0.3,
max_steps=-1,
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="constant",
report_to="wandb",
run_name=f"codellama-{datetime.datetime.now().strftime('%Y-%m-%d-%H-%M')}",
)
trainer = SFTTrainer(
model=model,
train_dataset=tokenized_train_dataset,
eval_dataset=tokenized_val_dataset,
peft_config=peft_params,
#dataset_text_field="text",
max_seq_length=None,
tokenizer=tokenizer,
args=training_params,
dataset_text_field="text",
packing=False
)
trainer.train()