Tokenizer deprecating in ORPO

Dear all,
I was training an LLM using ORPO based on the guides by Maxime Labonne.

Unfortunately, I encountered the following error on the code below:
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.

Trying to resolve it by using the processing_class in the way suggested by Gemini/ChatGPT did not resolve the problem. Does anyone have an idea how to resolve this?

Kind regards,
Ben

orpo_args = ORPOConfig(
    learning_rate=8e-6,
    beta=0.1,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    #Ideally train 3-5 epochs
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=10,
    report_to="wandb",
    output_dir="./results/",
)

trainer = ORPOTrainer(
    model=model,
    args=orpo_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    peft_config=peft_config,
    tokenizer=tokenizer,
)
trainer.train()
trainer.save_model(new_model)
1 Like

It appears that an issue has been issued on github and fixed. The date is yesterday, so I’m not sure if it has been reflected yet.

1 Like

Hi John,
thanks for highlighting this - apparently it hasn’t been reflected yet. Will probably keep trying in the following days.

In a local environment, this change alone would be a workaround, but it would be best to wait until it is fixed for the latest version.

pip uninstall transformers
pip install transformers==4.45.2
1 Like

Dear John,
thanks for helping a newbie – that literally fixed all my problems and stupid me didn’t think of just using the previous transformers version.

1 Like

It’s not a solution, it’s a workaround, but, well, better to have it work than not to have it work!