RewardTrainer Problem

AttributeError Traceback (most recent call last)
in <cell line: 0>()
1 # Initialize RewardTrainer
----> 2 trainer = RewardTrainer(
3 model=model,
4 args=training_args,
5 tokenizer=tokenizer,

1 frames
/usr/local/lib/python3.11/dist-packages/trl/trainer/reward_trainer.py in init(self, model, args, data_collator, train_dataset, eval_dataset, processing_class, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics, peft_config)
167
168 # Disable dropout in the model
ā†’ 169 if args.disable_dropout:
170 disable_dropout_in_model(model)
171

AttributeError: ā€˜TrainingArgumentsā€™ object has no attribute ā€˜disable_dropoutā€™

1 Like

Instead of TrainingArguments, use RewardTrainingArguments from trl:

from trl import RewardTrainingArguments

training_args = RewardTrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    evaluation_strategy="steps",
    eval_steps=500,
    save_strategy="steps",
    save_steps=500,
    logging_steps=100,
    learning_rate=5e-5,
    weight_decay=0.01,
    num_train_epochs=3,
    disable_dropout=True  # Important: This is required
)

trainer = RewardTrainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)

If you still want to use TrainingArguments, you can manually disable dropout in your model before passing it to RewardTrainer:

from trl.trainer.utils import disable_dropout_in_model

disable_dropout_in_model(model)  # Manually disable dropout

However, using RewardTrainingArguments is the recommended approach.

1 Like

Thank you, but what happened here:

1 Like

RewardConfig?

Actually when I run the given code from hugging face, it has some error.
I just copy paste.

1 Like

Is it possible that you have an old version of trl?

pip install -U trl transformers peft accelerate huggingface_hub
1 Like

hello , use this snippet

from trl import RewardConfig, RewardTrainer
training_args = RewardConfig(output_dir="Qwen2.5-0.5B-Reward", per_device_train_batch_size=2)

reference_link: https://github.com/huggingface/trl

2 Likes