Hey there, I’m trying to create a DatasetDict with two datasets(train and dev) for fine tuning a bart model.
I’ve created lists of source sentences, target sentences and id’s, they are lists of strings.
data = DatasetDict({
"train": Dataset.from_dict({
"id": train_idxs,
"translation": {
"source": train_inputs,
"target": train_labels
}
}, features=Features({"id": Value(dtype='string'), "translation": {"source": Sequence, "target": Sequence}})),
"dev": {
"id": dev_idxs,
"translation": {
"source": dev_inputs,
"target": dev_labels
}
}
})
is the code I’m using to create the DatasetDict, but I get error
TypeError: string indices must be integers
I want the object to have the same structure as the “Books” datasetdict that is used in this guide Translation
if anyone has any suggestions please let me know, as well as if I need to provide more information!
Thank you!