Teaching Transformers to Sum Numbers

I am using T-5 base model to add numbers. The model works well with regular addition(100% accuracy). However, when I train it with modular sum where the system has to choose the last number of the sum. (e.g. for 5+5+7 the answer should be 7) it completely fails ( each epoch it predicts the same number for all of the questions) . I am really surprised that the same system fails at this easy problem. Anybody has any idea why or how to solve?