Cache T5 encoder results within batch when training

marton-avrios · March 6, 2021, 9:28am

My batches are a little unorthodox, they consist of the same input but different targets. Basically I moved the prefixes T5 uses from the input to the target and trained the model this way. Like this:

input: “I am Alex, 33 years old”, target: “name: Alex”
input: “I am Alex, 33 years old”, target: “age: 33”

It would make training much faster if I could calculate the encoder output just once per batch and reuse it within the batch. So 1 forward pass for the encoder but multiple backward passes. Is there a library supported way to do this or do I need to go completely custom and write the training loop from scratch?

Topic		Replies	Views
Pretrain encoder of tf T5 model Intermediate	0	539	October 19, 2020
Large max differences between single input processing and batching with Bert and T5 🤗Transformers	0	565	April 26, 2021
What happens in the MT5 documentation example? Beginners	3	2038	January 11, 2021
Train T5 decoder only on a different language Models	0	462	March 16, 2021
Postional Encoding calculation for T5 🤗Transformers	0	190	March 16, 2023

Cache T5 encoder results within batch when training

Related topics