RAG performance on WebQuestion dataset lower than expected

Hi, I recently fine-tuned the Rag model (based on Rag-token-base) in WebQuestion, but the EM score is only 28. The performance in the paper is 45. Do you have any saved models that can match the performance, or does there any tricks during the fine-tuning?