I’m performing sentiment classification on some sentences.
What I do is I fine-tune RoBERTa on a subset of my data.
My question is what text wrangling I should perform?
Should I lowercase, stem, remove punctuation?
With a simpler model (logistic regression, tree, etc.), I would definitely do all of these things, but I’m not sure what to do when I use a transformer.