I was going through the code of data2vec for text. In their paper, the author mentioned to compute targets by averaging the output of top K block of encoders. However, I don’t find any comment in the code doing that. Can you point me out which line of code perform that computation? Also, the loss function mentioned in the paper is L1 loss and MSE. But the same is not visible in code.