T5 Finetuning Tips

valhalla · August 12, 2020, 2:10pm

This trick of loading the model outside of _map_fn is awesome! It should save some memory. In pytorch-xla the model and the datset is loaded in all processes (8 in case 8 TPU cores) so it ends up taking lot of memory. Lazy loading dataset should also reduce RAM usage.

On V3-8, I was able to use bs of 8 per device with max_source_length 512 and max_target_length 64

Topic		Replies	Views
Finetuning T5 for a task Intermediate	21	6929	September 3, 2022
Finetuning T5 on translation task 🤗Transformers	0	490	September 10, 2021
Does task specific prefix matters for T5 fine-tuning? Beginners	9	7295	June 28, 2021
T5: Tips for finetuning on crossword clues (clue => answer) Models	1	629	October 14, 2020
Finetuning mT5 for specific language pair Models	0	144	October 17, 2024

T5 Finetuning Tips

Related topics