The problem happened in trl/trl/trainer /utils.py in line **456**
```
else:
… # adapted from https://stackoverflow.com/questions/73256206
if "prompt" in k:
to_pad = [torch.LongTensor(ex[k][::-1]) for ex in batch]
else:
456 to_pad = [torch.LongTensor(ex[k]) for ex in batch]
if k.endswith("_input_ids"):
padding_value = self.tokenizer.pad_token_id
```
I am using **[Qwen/Qwen-1_8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat)** model and **official finetune.py** to do the DPOTrain.
My training datasets are like this:
> {"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
If I direclty run the DPO code will meet the problem:
> File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
> data = self._dataset_fetcher.fetch(index) # may raise StopIteration
> File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
> return self.collate_fn(data)
> File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 490, in __call__
> return self.collate(tokenized_batch)
> File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 449, in collate
> to_pad = [torch.LongTensor(ex[k]) for ex in batch]
> File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 449, in <listcomp>
> to_pad = [torch.LongTensor(ex[k]) for ex in batch]
> TypeError: an integer is required (got type NoneType)
If I debug the code in line 483:
```
for feature in features:
prompt = feature["prompt"]
chosen = feature["chosen"]
rejected = feature["rejected"]
483 batch_element = self.tokenize_batch_element(prompt, chosen, rejected)
print(batch_element)
tokenized_batch.append(batch_element)
```
If I print the batch_element out, there will be another extra None at the end of the array:
> batch_element:{'chosen_input_ids': [16, 10, 17, 28, 19, None], 'chosen_attention_mask': [1, 1, 1, 1, 1, 1], 'chosen_labels': [-100, -100, -100, -100, 19, None], 'rejected_input_ids': [16, 10, 17, 28, 18, None], 'rejected_attention_mask': [1, 1, 1, 1, 1, 1], 'rejected_labels': [-100, -100, -100, -100, 18, None], 'prompt_input_ids': [16, 10, 17, 28], 'prompt_attention_mask': [1, 1, 1, 1], 'prompt': '1+2=', 'chosen': '1+2=4', 'rejected': '1+2=3', 'chosen_response_only': '4', 'rejected_response_only': '3'}
My chosen_input_ids 1+2=4 length should be **5**, but after self.tokenize_batch_element the 'chosen_input_ids': [16, 10, 17, 28, 19, None] length is **6**, and there is another extra **None** lead the _TypeError: an integer is required (got type NoneType)_ problem.
So, I changed the line `456 to_pad = [torch.LongTensor(ex[k]) for ex in batch]` to `456 to_pad = [torch.LongTensor(ex[k][:-1]) for ex in batch]` and It worked
> {'loss': 0.2599, 'learning_rate': 0.0003, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.21053116023540497, 'logps/chosen': -4.585531234741211, 'logits/rejected': -2.686852216720581, 'logits/chosen': -2.6731910705566406, 'epoch': 1.0}
> {'loss': 0.2599, 'learning_rate': 0.00015, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.21053116023540497, 'logps/chosen': -4.585531234741211, 'logits/rejected': -2.686852216720581, 'logits/chosen': -2.6731910705566406, 'epoch': 2.0}
> {'loss': 0.1227, 'learning_rate': 0.0, 'rewards/chosen': 0.27905863523483276, 'rewards/rejected': -0.17719139158725739, 'rewards/accuracies': 1.0, 'rewards/margins': 0.45625001192092896, 'logps/rejected': -1.9824450016021729, 'logps/chosen': -1.7949450016021729, 'logits/rejected': -2.546565055847168, 'logits/chosen': -2.5510566234588623, 'epoch': 2.67}
> {'train_runtime': 2.826, 'train_samples_per_second': 3.185, 'train_steps_per_second': 1.062, 'train_loss': 0.2141884664694468, 'epoch': 2.67}
> 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s]
> ***** train metrics *****
> epoch = 2.67
> train_loss = 0.2142
> train_runtime = 0:00:02.82
> train_samples = 3
> train_samples_per_second = 3.185
> train_steps_per_second = 1.062
> Training metrics: {'train_runtime': 2.826, 'train_samples_per_second': 3.185, 'train_steps_per_second': 1.062, 'train_loss': 0.2141884664694468, 'epoch': 2.67, 'train_samples': 3}
I do not know whether am I right, or I did not use it the right way.
I think the problem may happened because Qwen has it own tokenizer.
My prompt dict :
```
return {
"prompt": ["Question: " + question + "\n\nAnswer: " + for question in examples["question"]],
"chosen": examples["response_chosen"],
"rejected": examples["response_rejected"],
}
```
DPOTrainer :
```
trainer = DPOTrainer(
model,
ref_model=deepcopy(model),
args=training_args,
beta=0.1,
tokenizer=tokenizer,
peft_config=lora_config,
max_prompt_length=training_args.model_max_length,
max_length=training_args.model_max_length,
train_dataset=data_module['train_dataset'],
eval_dataset=data_module['eval_dataset'],
)
```
tokenizer :
```
tokenizer = transformers.AutoTokenizer.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
model_max_length=training_args.model_max_length,
padding_side="right",
use_fast=False,
trust_remote_code=True,
)
tokenizer.pad_token_id = tokenizer.eod_id
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = 0 # set as the <unk> token
```