I make some tries and get some results. It is indeed caused the model-side code.
First, I replace the MLC task with a simple auto-encoding task, namely that feed the model same inputs and outputs. Besides, I tie word embeddings between encoder and decoder. These do not solves the problem.
Next, I replace the seq2seq model with a simple prefix-LM(RobertaForCausalLM), fed with same auto-encoding results. And as I guessed before, the problem vanished. All things work well now.
I believe there exists some bug in my code or transformers library.