TypeError with TrainingArguments evaluation_strategy in SFTTrainer with Unsloth

andreass01 · July 31, 2025, 11:18pm

Getting TypeError: TrainingArguments.__init__() got an unexpected keyword argument 'evaluation_strategy' when initializing SFTTrainer with TrainingArguments including evaluation_strategy='steps' and load_best_model_at_end=True.

Using:

transformers 4.54.1
trl 0.20.0
unsloth 2025.7.11

Have already resolved pip dependency conflicts after upgrading transformers and installing tyro/msgspec.

The error persists specifically when including evaluation_strategy in TrainingArguments. Code snippet for trainer initialization:

from trl import SFTTrainer
from transformers import DataCollatorForSeq2Seq, TrainingArguments # Import TrainingArguments
from google.colab import userdata # Import userdata

# Define the formatting function to work with the 'messages' column
def formatting_prompts_func(examples):
    texts = []
    for conversation in examples["messages"]: # 'examples["messages"]' is a list of conversations
        # Ensure the conversation is a list and contains dictionaries
        if not isinstance(conversation, list):
            print(f"Skipping invalid conversation entry: {conversation}")
            continue # Skip this entry if it's not a list

        processed_conversation = []
        for message in conversation:
            # Ensure each message is a dictionary with 'role' and 'content'
            if isinstance(message, dict) and 'role' in message and 'content' in message:
                processed_conversation.append(message)
            else:
                print(f"Skipping invalid message format in conversation: {message}")
                # Decide how to handle malformed messages - skipping for now
                continue # Skip this message if it's not a valid dictionary format

        if not processed_conversation:
            print(f"Skipping conversation with no valid messages after filtering.")
            continue # Skip if no valid messages were found

        # Apply the chat template to the list of valid message dictionaries
        try:
            text = tokenizer.apply_chat_template(
                processed_conversation,
                tokenize=False,
                add_generation_prompt=False
            )
            texts.append(text)
        except Exception as e:
            print(f"Error applying chat template to conversation: {processed_conversation}. Error: {e}")
            continue # Skip conversation if chat template application fails


        # print(text) # Uncomment to debug the formatted text
    return texts # Return the list of strings directly

# Define TrainingArguments instead of SFTConfig
training_args = TrainingArguments(
    per_device_train_batch_size = 8,
    gradient_accumulation_steps = 8,
    warmup_steps = 5,
    num_train_epochs = 1, # Set this for 1 full training run.
   # max_steps = 30,
    learning_rate = 2e-4,
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "linear",
    seed = 3407,
    output_dir = "outputs",
    report_to = "wandb", # Use this for WandB etc

    # Add checkpointing and saving to Hugging Face
    save_strategy="steps",
    save_steps=250, # Save a checkpoint every 100 steps
    push_to_hub=True,
    hub_model_id="your_name/your_model", # Replace with your HF username and model name
    hub_token=userdata.get('HF_TOKEN'), # Use the HF token from secrets

    # Add save_total_limit and load_best_model_at_end
    save_total_limit=3, # Keep a maximum of 3 checkpoints
    load_best_model_at_end=True, # Load the best model based on evaluation at the end

    # Add evaluation strategy for load_best_model_at_end
    evaluation_strategy="steps",
    eval_steps=250, # Evaluate every 100 steps
)


trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    # dataset_text_field is not needed when formatting_func returns list[str]
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    packing = False, # Can make training 5x faster for short sequences.
    args = training_args, # Pass the TrainingArguments object
    formatting_func = formatting_prompts_func, # Add the formatting function
).

Any guidance on resolving this compatibility issue to enable load_best_model_at_end is appreciated

John6666 · July 31, 2025, 11:39pm

I think evaluation_strategy has been discontinued and renamed to eval_strategy. Please try it out.

andreass01 · August 1, 2025, 7:50am

yeah I tried It too but nothing, thanks either way!

John6666 · August 1, 2025, 8:10am

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset
# eval_dataset = eval_dataset is missing?

Hmm, that’s about all I can think of. I think you’ll be able to figure it out by following the error messages.

github.com/huggingface/transformers

`load_best_model_at_end` is inconsistent with evaluation (and save) logic at end of training

opened 05:43PM - 16 Jan 24 UTC

closed 08:06AM - 13 May 24 UTC

antoine-lizee

trainer

### System Info - `transformers` version: 4.36.2 - Platform: Linux-5.10.201-19…1.748.amzn2.x86_64-x86_64-with-glibc2.26 - Python version: 3.10.13 - Huggingface_hub version: 0.20.2 - Safetensors version: 0.3.3 - Accelerate version: 0.26.0 - Accelerate config: not found - PyTorch version (GPU?): 2.0.1 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: yes - Using distributed or parallel set-up in script?: no ### Who can help? @muellerzr @pacman100 @sgugger ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below) ### Reproduction Shortened script below: ``` model_checkpoint = "xlm-roberta-large" model_name = model_checkpoint.split("/")[-1] model = XLMRobertaForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list)) batch_size = 32 learning_rate = 2e-5 eval_steps = 0.1 # The data + batch size leads to having 11277 steps training_args = TrainingArguments( output_dir_name, logging_dir=run_dir, logging_strategy="steps", logging_steps=eval_steps / 5, evaluation_strategy="steps", eval_steps=eval_steps, save_strategy="steps", save_steps=eval_steps, learning_rate=learning_rate, per_device_train_batch_size=batch_size, per_device_eval_batch_size=batch_size, num_train_epochs=epochs, weight_decay=0.01, push_to_hub=False, save_total_limit=4, load_best_model_at_end=True ) data_collator = DataCollatorForTokenClassification(tokenizer) # Initialize Trainer trainer = Trainer( model=model, args=training_args, train_dataset=train_ds, eval_dataset=test_ds, data_collator=data_collator, tokenizer=tokenizer, compute_metrics=compute_metrics, ) # Train the model trainer.train() ``` ### Expected behavior I would expect that my model is evaluated (and saved!) at the last step. It is not, and in most example scripts we see `trainer.evaluate()` after the `trainer.train()`. As a result, when we set `load_best_model_at_end=True` we concretely **discard any training that happened after the last checkpoint**, which seems wrong. In my case, the last 10% of training is discarded. My understanding of what's happening: - In the trainer callback, we check ([here](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_callback.py#L447)) if the `global_step` is a multiple of the `eval_steps`. If the total number of step is not a multiple of it, this condition is not met at the last step. - If we `load_best_model_at_end`, the last accessible evaluation does not include the performance of the latest stages of training. - As a side note, running `trainer.evaluate()` by hand after the training only re-evaluates the past checkpoint that was selected as the best.

Topic		Replies	Views
🤗Trainer not saving after save_steps 🤗Transformers	2	4129	April 13, 2021
Training models for smaller epochs and then continue trianing 🤗Transformers	5	1346	January 16, 2021
Using Seq2SeqTrainer to eval during training? 🤗Transformers	1	1060	November 27, 2021
How to load metrics in HF Trainer for the best model when `load_best_model_at_end=true`? 🤗Transformers	0	744	November 4, 2021
Why save_steps should be a round multiple of eval_steps when load_best_model_at_end=True? 🤗Transformers	3	3856	October 18, 2021

TypeError with TrainingArguments evaluation_strategy in SFTTrainer with Unsloth

Related topics