I tried to use ProphetNet with Seq2SeqTrainer, but it failed.
The error message tell me: This is because the collator I implemented uses prepare_seq2seq_batch() in _encode(), but prepare_seq2seq_batch() is not implemented for ProphetNet Tokenizer.
Is there any reason ProphetNet cannot have prepare_seq2seq_batch() in its tokenizer?
My understanding may be insufficient, but it seems that a function that assigns special tokens in a unique way is implemented for the tokenizer. Is that the cause?
If it is implemented like other Seq2SeqLM, will ProphetNet’s original performance not be exhibited?
I’m now trying to implement the method, and it seems my implementation works on CPU.
(I had to set --max_source_length to a small value for avoiding index out of range error.)
I saw that the code occurred errors (CUDA error, not reproducible…) on GPU, but it sometimes worked without error. I think I should check the reason.
I’m sorry but I haven’t tried it on TPU.
I think now I understood what has caused the problem. ProphetNetConfig has max_position_embeddings=512 as a parameter,
but the default --max_source_length of finetune_trainer.py is 1024.
It was natural that an error would occur unless the value was at least “less than 512”.
(Maybe more precisely, I also have to consider the input will be added special tokens, right?)
On GPU, I could run the script as below without CUDA error.
("max_source_length=500 & per_device_train_batch_size=8 (default batch size)” seemed to need too large memory size to run on the GPU I could use on Colab.)
Thank you for your kind comment! I’d love to make a PR to push my changes to the library.
I now understood that the special tokens will be included in the max_source_length.
After adding some fixes (such as max_source_length setting and removing comments for myself), I’d like to make the PR in this week.
Hi @patrickvonplaten,
Thank you for your comment! I’m glad to be of a little help.
I’ve checked the review comment, and as I responded there, I’m going to try to work on the fine-tuning experiment.