UPDATE: at least for now the problem seems to be fixed. I downgraded the transformers library to version 4.49.0, used the transfomers.Trainer instead of the SFTTrainer and modified the loading of the model to the following.
# Load bitsandbytes config
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=False)
# Load LoRA configuration
peft_config = LoraConfig(
r=64, lora_alpha=16, lora_dropout=0, task_type="CAUSAL_LM", target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"])
# Load the model and prepare it for peft finetuning
model_name = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name, quantization_config=bnb_config, device_map="auto")
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
Maybe this will help someone in the future!