Custom gradient accumulation scheme in Trainer

rvienne · June 23, 2023, 7:56am

Hello everyone,
I’m currently fine-tuning a multilabel text classification model from pretrained weights.
In my problem, I feed sequences of medical reports to my model, the sequences are chronologically ordered. My data is split into groups:

each individual in the dataset has a group.
we can extract one or multiple sequences of text from any individual for training.

Since there is an overlap between sequences and that their label is highly correlated (about future events) for a given group, I’m trying to implement a custom gradient accumulation scheme, since individuals don’t have the same amount of sequences (making a fixed gradient_accumulation_steps parameter impossible to use).

Motivation: If I don’t implement this and since we sample batches sequentially, there is a potential bias in letting the model backpropagate the loss of a given sequence before feeding the next one for a given individual. Therefore, I’m looking to do individual-wise gradient accumulation or group-wise gradient accumulation.

Is there a common way to do this ? I have no clue except trying to modify the _inner_training_loop directly, which might not be recommended since it is a private method.

Thanks in advance for your help.

Topic		Replies	Views
Training BERT model from scratch with custom sequence Beginners	0	392	September 21, 2022
Gradient accumulation: should I duplicate data? 🤗Transformers	7	1013	February 1, 2021
Separate LM fine tuning and classification head training Beginners	5	1854	July 1, 2021
Multi Label Zero Shot Classification with Graphs Beginners	1	713	August 8, 2023
Text classification training on long text Intermediate	3	4937	June 18, 2024

Custom gradient accumulation scheme in Trainer

Related topics