Custom gradient accumulation scheme in Trainer

Hello everyone,
I’m currently fine-tuning a multilabel text classification model from pretrained weights.
In my problem, I feed sequences of medical reports to my model, the sequences are chronologically ordered. My data is split into groups:

  • each individual in the dataset has a group.
  • we can extract one or multiple sequences of text from any individual for training.

Since there is an overlap between sequences and that their label is highly correlated (about future events) for a given group, I’m trying to implement a custom gradient accumulation scheme, since individuals don’t have the same amount of sequences (making a fixed gradient_accumulation_steps parameter impossible to use).

Motivation: If I don’t implement this and since we sample batches sequentially, there is a potential bias in letting the model backpropagate the loss of a given sequence before feeding the next one for a given individual. Therefore, I’m looking to do individual-wise gradient accumulation or group-wise gradient accumulation.

Is there a common way to do this ? I have no clue except trying to modify the _inner_training_loop directly, which might not be recommended since it is a private method.

Thanks in advance for your help. :wink: