Dear Experts,
I am trying to train a model using BertForSequenceClassification
on a dataset that contains short text snippets and their evaluations that were made by humans. Each snippet/sentence was evaluated with respect to psychological trait by at least 10 humans on a scale [-5 -4 -3, -2, -1, 0, 1, 2, 3, 4, 5].
For each sentence my final dataset I have mean and standard deviation of responses. The latter (SD) is informative of how much participants agreed about the evaluation of the sentence.
In the video Simple Training with the 🤗 Transformers Trainer - YouTube there is a demonstration of implementation of weighed loss function in which the weights are driven by unequal number of labels in training dataset. I guess that in my case it would make sense to train my model in a way that would “learn more” from examples for which text evaluations were in agreement (low SD) and “learn less” from examples for which the text evaluations have higher standard deviation(?)
BTW As far as I know I had to re-scale responses to 0-1 range and the model is initialized with num_labels=1
(regression problem).
How to build a custom trainer that would use a loss function based on nn.MSELoss()
that would also be weighed by SD of every training item?