I’m kind of confused about why t5-v1_1 disable parameter sharing. What is this designed for?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
T5 tokenizer vs t51.1 tokenizer | 0 | 215 | March 1, 2024 | |
BigBirdPegasus with attention_type="original_full" vs T5 | 0 | 254 | March 11, 2022 | |
Finetuning T5 large for paraphrasing multiple time with the same parameters and data gives different results | 2 | 844 | February 7, 2023 | |
Freezing mt5 model for fine-tuning | 1 | 484 | July 15, 2023 | |
T5 Model Generate and Model Outputs Vastly Different | 2 | 835 | August 19, 2025 |