When batched=True
, both examples['final']
and examples['raw']
will be a list of batch_size
elements. You may have to use
def tokenizer_func(examples):
return tokenizer([generate_prompt(raw_text) for raw_text in examples['raw']]),
truncation=True, padding=True, max_length=128, return_tensors="pt")