SftTrainer and mps (validation loss nan)

NiallRooney · March 26, 2024, 3:37pm

Whe I finetune a tinyLlama model using a sample of Alpaca data, the process trains ok in Colab however when I try to run this locally on a Macbook Ventura 13.6.4 using MPS , the validation loss is nan at the first step?

model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=“TinyLlama/TinyLlama-1.1B-Chat-v1.0”,
device_map=‘mps’,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16
)

peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.1,
bias=“none”,
task_type=“CAUSAL_LM”,
target_modules=[“q_proj”, “k_proj”,“v_proj”,“o_proj”],
modules_to_save=None,
)

training_args = TrainingArguments(
output_dir=“./alpaca_output/”,
report_to=“none”,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
learning_rate=2e-4,
lr_scheduler_type=“cosine”,
num_train_epochs=1,
evaluation_strategy=“steps”,
# logging strategies
logging_strategy=“steps”,
logging_steps=1,
gradient_checkpointing=True,
gradient_accumulation_steps=1,
seed=1,
save_strategy=“epoch”,
)

trainer = SFTTrainer(
model,
peft_config=peft_config,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
packing=True,
max_seq_length=1024,
args=training_args,
formatting_func=create_alpaca_prompt

)

Topic		Replies	Views
Getting No log in validation_loss 🤗Transformers	3	218	February 14, 2025
[SOLVED] Trying to fine-tune Llama, getting NaN gradients after a single step Models	1	1009	August 23, 2024
Errors when trying to fine-tune OpenLLaMA using Trainer API 🤗Transformers	1	376	December 26, 2024
TRL SFT super prone to nan when using data collator Intermediate	2	1324	April 27, 2024
Using alpaca with local embedding Intermediate	1	1348	July 19, 2023

SftTrainer and mps (validation loss nan)

Related topics