Hi @Jeremias
In recent versions all models now live under their own dir, so bart is now in models.bart
huggingface’s
datasetsobject only consists of lists
datasets can return any type (list, numpy array, torch tensor, tf tensor), by default it returns list, you need to explicitly set the format for it to return tensors, it’s explained in the datasets intro colab,
also, you won’t need to manually call shift_tokens_right to prepare decoder_input_ids, if you just pass labels the model will prepare the decoder_input_ids by correctly shifting them.
We have Bart training examples in examples/seq2seq here , which should help you fine-tune bart.
Hope this helps.