Hi!
So I’ve developed an incremental fine tune training pipeline which is based on T5-large and somewhat vexing in terms of OOM issues and whatnot, even on a V100 class GPU with 16GB of contiguous memory. And the dateset is constantly changing so I am attempting to establish ideal hyperparams with each training run by for example calculating max_sequence_length dynamically:
"max_seq_length": len(tokenizer(df.loc[df.input_text.astype(str).map(len).argmax(), 'input_text'])['input_ids'])
"max_source_length": len(tokenizer(df.loc[df.input_text.astype(str).map(len).argmax(), 'input_text'])['input_ids']),
"max_target_length": len(tokenizer(df.loc[df.target_text.astype(str).map(len).argmax(), 'target_text'])['input_ids'])
Is this a reasonable approach to keep memory consumption down? And is there a need for any padding for tokens that would be added programmatically during fine tune training?
TIA!