Getting weird results from roberta new

I am getting weird results from roberta base sentiment latest model (out of the box). in the screenshot you see that the results from roberta latest are similar for sentences that are very different on sentiment. can someone help?