I am working on Arabic question generation using arabert_base, and mMarco dataset. I am following BERT2BERT for CNN/Dailymail notebook, and the training notebook on Arabic Empathetic Chatbot repo.
The problem is that (rouge, bleu, meteor) metrics all zeros, and the generated output is [CLS] [CLS] [CLS] [CLS] [CLS] [CLS] [CLS] [CLS] [CLS] repeated CLS token until the sentence reaches the maximum length. I am training on small subset to check if the model works fine before the full training.
I want to know if the small training set(10000 sample) is responsible for the problem, or the preprocessing method, or anything else.
please check my notebook