I am a beginner and currently fine-tuning a BART Model (ForConditionalGeneration) for text summarization. I wanted to experiment with dropout however noticed that the model has different times of dropouts kin its configuration, namely attention_dropout
I could not find any documentations on the different types and which best to use and hope I am correct in this forum. I only found that the default values for all of them except the basic dropout is 0.0. And was wondering if this has a special reason for the application? Can anybody give me some guidance here?
Thank you in advance