BertForSequenceClassification can be used for regression when number of classes is set to 1. The documentation says that BertForSequenceClassification calculates cross-entropy loss for classification. What kind of loss does it return for regression?
(I’ve been assuming it is root mean square error, but I read recently that there are several other possibilities such as Huber or Negative Log Likelihood.)
Which is it?
How should I find out / where is the code?