The reason `prepare_seq2seq_batch` for ProphetNet is not existed

yusukemori · October 26, 2020, 6:17pm

Hi,

I tried to use ProphetNet with Seq2SeqTrainer, but it failed.

The error message tell me: This is because the collator I implemented uses prepare_seq2seq_batch() in _encode(), but prepare_seq2seq_batch() is not implemented for ProphetNet Tokenizer.

Is there any reason ProphetNet cannot have prepare_seq2seq_batch() in its tokenizer?

My understanding may be insufficient, but it seems that a function that assigns special tokens in a unique way is implemented for the tokenizer. Is that the cause?

If it is implemented like other Seq2SeqLM, will ProphetNet’s original performance not be exhibited?

Thank you in advance.

yusukemori

sshleifer · November 3, 2020, 7:42pm

I don’t know about prophetnet special tokens, but it would be useful to implement that method!

cc @patrickvonplaten @valhalla

yusukemori · November 4, 2020, 8:32am

@sshleifer

Thank for responding my question!

I agree that it would be useful if the method is implemented.
I’m not a good engineer, but I hope I can be some help for the implementation.

yusukemori

patrickvonplaten · November 4, 2020, 10:17am

Yes, I think we should definitely implement this method Will try to find time in the next couple of weeks

yusukemori · November 5, 2020, 6:27pm

@sshleifer @patrickvonplaten @valhalla

I’m now trying to implement the method, and it seems my implementation works on CPU.
(I had to set --max_source_length to a small value for avoiding index out of range error.)
I saw that the code occurred errors (CUDA error, not reproducible…) on GPU, but it sometimes worked without error. I think I should check the reason.
I’m sorry but I haven’t tried it on TPU.

I’m now working on branch forest1988-prophetnet-prepare-seq2seq-batch of https://github.com/forest1988/transformers.git .

I’ll continue to check if it works as I intended on GPU and TPU.

Moreover, I made a ipynb file for debugging and it maybe useful for you to check how my modified implementation works on Colaboratory.
https://github.com/forest1988/colaboratory/blob/main/prophetnet_seq2seqtrainer.ipynb .

Thank you.

yusukemori

yusukemori · November 6, 2020, 1:10am

Now I can reproduce the same error on GPU I have seen before.

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

It occurs when I remove —max_source_length 20.

yusukemori · November 6, 2020, 8:52am

Excuse me to bother you again.

I think now I understood what has caused the problem.
ProphetNetConfig has max_position_embeddings=512 as a parameter,
but the default --max_source_length of finetune_trainer.py is 1024.
It was natural that an error would occur unless the value was at least “less than 512”.
(Maybe more precisely, I also have to consider the input will be added special tokens, right?)

On GPU, I could run the script as below without CUDA error.

# For GPU

!python finetune_trainer.py \
    --learning_rate=3e-5 \
    --do_train --do_eval --evaluate_during_training \
    --max_source_length 500 \
    --per_device_train_batch_size 2 \
    --predict_with_generate \
    --n_train 100 \
    --n_val 10 \
    --model_name_or_path microsoft/prophetnet-large-uncased \
    --data_dir $XSUM_DIR \
    --output_dir tmp \
    --overwrite_output_dir

("max_source_length=500 & per_device_train_batch_size=8 (default batch size)” seemed to need too large memory size to run on the GPU I could use on Colab.)

patrickvonplaten · November 10, 2020, 7:27am

Hey @yusukemori. Awesome that the function seems to work for you - do you want to make a PR to push your changes to the library?

patrickvonplaten · November 10, 2020, 7:28am

I think you should set max_source_length to 512. The special tokens will be included in this max length

yusukemori · November 10, 2020, 8:02am

Hi @patrickvonplaten,

Thank you for your kind comment! I’d love to make a PR to push my changes to the library.

I now understood that the special tokens will be included in the max_source_length.
After adding some fixes (such as max_source_length setting and removing comments for myself), I’d like to make the PR in this week.

patrickvonplaten · November 16, 2020, 1:18pm

Thanks a bunch for adding this functionality @yusukemori! https://github.com/huggingface/transformers/pull/8515#pullrequestreview-531309943 for reference

yusukemori · November 16, 2020, 2:43pm

Hi @patrickvonplaten,
Thank you for your comment! I’m glad to be of a little help.
I’ve checked the review comment, and as I responded there, I’m going to try to work on the fine-tuning experiment.

Thank you again!

Topic		Replies	Views
The difference between Seq2SeqDataset.collate_fn and Seq2SeqDataCollator._encode Beginners	2	1303	October 24, 2020
Seq2SeqTrainer: enabled must be a bool (got NoneType) 🤗Transformers	15	3953	December 5, 2022
How to use Seq2SeqTrainer (Seq2SeqDataCollator) in v4.2.1 🤗Transformers	5	2564	January 20, 2021
What is the purpose of DataCollatorForSeq2Seq when using EncodeDecoder architecture? Course	0	2181	February 4, 2022
Some unintended things happen in Seq2SeqTrainer example 🤗Transformers	3	1583	November 30, 2020

The reason `prepare_seq2seq_batch` for ProphetNet is not existed

Related topics