Different sentiments when texts processed in batches vs singles

I was observing a strange behavior in sentiment analysis with a finetuned model and tokenizer. Basically, when I tokenize the texts and give them individually as input to the model, it produces different class probabilities compared to when I tokenize and input as batch.

Here’s my observation:

text = tweets["eth"]["text"].values.tolist()[0]
print(text)
encoded_input = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
output = model(**encoded_input)
softmax(output.logits.detach().numpy())

Outputs:

ethereum will be back above xxx at some point im buying more now
array([[1.9731345e-04, 1.1291850e-03, 9.9867338e-01]], dtype=float32)

Whereas

text = tweets["eth"]["text"].values.tolist()[:2]
print(text)
encoded_input = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
output = model(**encoded_input)
softmax(output.logits.detach().numpy())

Outputs:

['ethereum will be back above xxx at some point im buying more now', 'tripledigit eth is also a chance of a lifetime']
array([[1.12888454e-04, 6.46036817e-04, 5.71367383e-01],
       [8.55998078e-05, 1.14106620e-03, 4.26646918e-01]], dtype=float32)

So, the same sentence yields different class probabilities. What is the issue here? I would of course like to process my data in batches.

Cheers