I am working on Cosmos QA dataset and need to add a new column of the following format: Value(dtype=âstringâ, id=None)
The current dataset has the following features:
Dataset({
features: [âidâ, âcontextâ, âquestionâ, âanswer0â, âanswer1â, âanswer2â, âanswer3â, âlabelâ],
num_rows: 25262
})
on a sample of my dataset â print(tokenize(clean_dataset["train"][:2])) I get the following error ValueError: text input must of type str(single example),List[str](batch or single pretokenized example) orList[List[str]] (batch of pretokenized examples).
However, when I run this complete_tok = tokenizer(list(x_complete), truncation=True, padding=True) where x_complete is an np array the tokenizer seems to run fine and creates input_ids and attention_mask
Hi guys, I am new here, just started using Those like me who are facing the same issue, I think the error is because of dataset type is âdictâ and thatâs why it gives AttributeError: âDatasetâ object has no attribute 'add_columnâ
My dataset structure is Dataset({
train: [âindexâ, âtextâ, âfileâ],
num_rows: 5000
})
I had trouble with the add_column method, maybe it has been deprecated since this post?
However it is possible to create a Dataset directly from a Python dictionary using the Dataset.from_dict method. Using this it is possible to add a column to a Dataset by extracting existing Dataset columns into a Python dictionary, updating the dictionary with the desired column, then re-creating the Dataset object with the additional column(s).