I am trying to use the facebook mbart-large-50 model to fine-tune for en-ro translation task.
raw_datasets = load_dataset(âwmt16â, âro-enâ)
Referring to the notebook, I have modified the code as follows.
Please let me know the following.
tokenized_datasets = raw_datasets.map(preprocess_function, batched=True)
Is the above step necessary? I am unable to run it as get the following error.
TypeError: Provided function which is applied to all elements of table returns a dict of types [<class âtorch.Tensorâ>, <class âtorch.Tensorâ>]. When using batched=True, make sure provided function returns a dict of types like (<class 'list'>, <class 'numpy.ndarray'>).
If the above step is bypassed the below error is given during training. ie. trainer.train()
We have to pad the labels before calling tokenizer.pad as this method wonât pad them and needs them of the
276 # same length to return tensors.*
AttributeError: âtokenizers.Encodingâ object has no attribute âkeysâ
Please let me know the correct way of passing the values.
Hi ,
I am getting the same error AttributeError: âtokenizers.Encodingâ object has no attribute âkeysâ. Did you solve it. Please help me with thisâŚ