LongformerForQuestionAnswering - reaching TriviaQA leaderboard results

Hi everyone,

I’m trying to reach the reported leaderboard results of Longformer (from the paper), and I am struggling.
Steps that I took:

  1. I downloaded TriviaQA’s original dev set.
  2. I’m using LongformerForQuestionAnswering for evaluation.
  3. I normalize the predicted answers and compare them to the gold-label answers to compute ExactMatch.

Am I missing something? Should any further processing be done before evaluating with LongformerForQuestionAnswering?
I already looked at the Github repo of Longformer, it doens’t seem like they do any additional preprocessing to the dev data/context.

Maybe @beltagy can help :slight_smile:

I’ve been struggling for weeks to fine-tune this model with kaggle data on tensorflow…