I was hoping to fine tune T5 on the MATH dataset until I came to realize that it’s tokenizer returned tokens for a lot of LaTeX characters (ie. ,{,}).
Is there a go to Seq2Seq model who’s tokenizer doesn’t experience this problem?
I was hoping to fine tune T5 on the MATH dataset until I came to realize that it’s tokenizer returned tokens for a lot of LaTeX characters (ie. ,{,}).
Is there a go to Seq2Seq model who’s tokenizer doesn’t experience this problem?