TFOpenAIGPTDoubleHeadsModel Loss Function


I recently read the article:

As an exercise I was trying to convert this all to use tensorflow instead of pytorch. I seem to be missing something, and I am sure it is a gap in my knowledge. Everything seems to be pretty straight forward except for calculating the loss in the loss function.

In the article it states, “The total loss will be the weighted sum of the language modeling loss and the next-sentence prediction loss”

Now the pytorch version of OpenAIGPTDoubleHeadsModel returns the both loss values in the “call” function. But the Tensorflow version, TFOpenAIGPTDoubleHeadsModel, does not. Does anyone have any knowledge, or the experience to go about calculating the loss from the TFOpenAIGPTDoubleHeadsModel model during training? The TF model doesn’t even take the labels which the pytroch version uses to calculate the loss.

Thank you for any input.