I am working on Cosmos QA dataset and need to add a new column of the following format: Value(dtype=‘string’, id=None)
The current dataset has the following features:
Dataset({
features: [‘id’, ‘context’, ‘question’, ‘answer0’, ‘answer1’, ‘answer2’, ‘answer3’, ‘label’],
num_rows: 25262
})
on a sample of my dataset → print(tokenize(clean_dataset["train"][:2])) I get the following error ValueError: text input must of type str(single example),List[str](batch or single pretokenized example) orList[List[str]] (batch of pretokenized examples).
However, when I run this complete_tok = tokenizer(list(x_complete), truncation=True, padding=True) where x_complete is an np array the tokenizer seems to run fine and creates input_ids and attention_mask