The reason `prepare_seq2seq_batch` for ProphetNet is not existed


I tried to use ProphetNet with Seq2SeqTrainer, but it failed.

The error message tell me: This is because the collator I implemented uses prepare_seq2seq_batch() in _encode(), but prepare_seq2seq_batch() is not implemented for ProphetNet Tokenizer.

Is there any reason ProphetNet cannot have prepare_seq2seq_batch() in its tokenizer?

My understanding may be insufficient, but it seems that a function that assigns special tokens in a unique way is implemented for the tokenizer. Is that the cause?

If it is implemented like other Seq2SeqLM, will ProphetNet’s original performance not be exhibited?

Thank you in advance.


1 Like

I don’t know about prophetnet special tokens, but it would be useful to implement that method!

cc @patrickvonplaten @valhalla



Thank for responding my question!

I agree that it would be useful if the method is implemented.
I’m not a good engineer, but I hope I can be some help for the implementation.


Yes, I think we should definitely implement this method :slight_smile: Will try to find time in the next couple of weeks

1 Like

@sshleifer @patrickvonplaten @valhalla

I’m now trying to implement the method, and it seems my implementation works on CPU.
(I had to set --max_source_length to a small value for avoiding index out of range error.)
I saw that the code occurred errors (CUDA error, not reproducible…) on GPU, but it sometimes worked without error. I think I should check the reason.
I’m sorry but I haven’t tried it on TPU.

I’m now working on branch forest1988-prophetnet-prepare-seq2seq-batch of .

I’ll continue to check if it works as I intended on GPU and TPU.

Moreover, I made a ipynb file for debugging and it maybe useful for you to check how my modified implementation works on Colaboratory. .

Thank you.


1 Like

Now I can reproduce the same error on GPU I have seen before.

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

It occurs when I remove —max_source_length 20.

Excuse me to bother you again.

I think now I understood what has caused the problem.
ProphetNetConfig has max_position_embeddings=512 as a parameter,
but the default --max_source_length of is 1024.
It was natural that an error would occur unless the value was at least “less than 512”.
(Maybe more precisely, I also have to consider the input will be added special tokens, right?)

On GPU, I could run the script as below without CUDA error.

# For GPU

!python \
    --learning_rate=3e-5 \
    --do_train --do_eval --evaluate_during_training \
    --max_source_length 500 \
    --per_device_train_batch_size 2 \
    --predict_with_generate \
    --n_train 100 \
    --n_val 10 \
    --model_name_or_path microsoft/prophetnet-large-uncased \
    --data_dir $XSUM_DIR \
    --output_dir tmp \

("max_source_length=500 & per_device_train_batch_size=8 (default batch size)” seemed to need too large memory size to run on the GPU I could use on Colab.)

Hey @yusukemori. Awesome that the function seems to work for you - do you want to make a PR to push your changes to the library?

1 Like

I think you should set max_source_length to 512. The special tokens will be included in this max length

1 Like

Hi @patrickvonplaten,

Thank you for your kind comment! I’d love to make a PR to push my changes to the library.

I now understood that the special tokens will be included in the max_source_length.
After adding some fixes (such as max_source_length setting and removing comments for myself), I’d like to make the PR in this week.

1 Like

Thanks a bunch for adding this functionality @yusukemori! for reference

1 Like

Hi @patrickvonplaten,
Thank you for your comment! I’m glad to be of a little help.
I’ve checked the review comment, and as I responded there, I’m going to try to work on the fine-tuning experiment.

Thank you again!