---------------------------------------------------------------------------
UndefinedError Traceback (most recent call last)
<ipython-input-14-09e2a2282d7c> in <cell line: 3>()
1 tokenizer = AutoTokenizer.from_pretrained(base_model)
2
----> 3 dataset = dataset.map(
4 format_chat_template,
5 )
8 frames
/usr/local/lib/python3.10/dist-packages/jinja2/environment.py in handle_exception(self, source)
934 from .debug import rewrite_traceback_stack
935
--> 936 raise rewrite_traceback_stack(source=source)
937
938 def join_path(self, template: str, parent: str) -> str:
<template> in top-level template code()
UndefinedError: 'str object' has no attribute 'role'
when i did futher checking, and got dataset.features from ‘mlabonne/orpo-dpo-mix-40k’ dataset it showed as
{'source': Value(dtype='string', id=None),
'chosen': [{'content': Value(dtype='string', id=None),
'role': Value(dtype='string', id=None)}],
'rejected': [{'content': Value(dtype='string', id=None),
'role': Value(dtype='string', id=None)}],
'prompt': Value(dtype='string', id=None)}
but my dataset has features :
{'chosen': Value(dtype='string', id=None),
'rejected': Value(dtype='string', id=None),
'prompt': Value(dtype='string', id=None)}
there is no role, how to add it. i am using a csv for creating a dataset using dataset.
from datasets import load_dataset
dataset = load_dataset(‘csv’, data_files=‘my_file.csv’)
this is the code that gives error
def format_chat_template(row):
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)
return row
dataset = dataset.map(
format_chat_template,
num_proc= os.cpu_count(),
)
dataset = dataset.train_test_split(test_size=0.01)
this is for Fine-tune Llama 3 with ORPO (huggingface.co)