I’m researching the performance of Retrieval-Augmented Generation (RAG) models on different languages, specifically Arabic and English. I’ve observed that the models consistently generate responses faster for Arabic queries, even though they are primarily trained on English data and Arabic queries often involve more tokens.
Question:
Why are Arabic responses generated faster than English responses, even though the models are less proficient in Arabic? Can you provide potential explanations and relevant references.
1 Like