PyTorch models predictions varies with the same data input

Starting by loading a pretrained model (I tried two here) and tokenizer:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import pandas as pd
import re
import torch

model_name = "Souvikcmsa/SentimentAnalysisDistillBERT"
#model_name = "Souvikcmsa/BERT_sentiment_analysis" --> Same issue found with a different model! 
model = AutoModelForSequenceClassification.from_pretrained(model_name, use_auth_token=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)

then, loading some txt data:

# Read the data
url = 'https://raw.githubusercontent.com/Giskard-AI/examples/main/datasets/twitter_us_airline_sentiment_analysis.csv'
data = pd.read_csv(url)

defining a basic preprocessor:

# Preprocess text (username and link placeholders)
# Replace the Username with @user and the URL in the tweet with http for better comprehension of data for the model
def preprocess(text):
    text = " ".join(text.split())
    text = re.sub(r'http\S+', 'http', text) 
    text = re.sub(r'@\S+', '@user', text)
    text = text.lower()
    return text

taking two subset from the data (first 5 and 100 entries):

torch.set_printoptions(precision=10)

for param in model.base_model.parameters():
    param.requires_grad = False

# ----- 1. Preprocess data -----#
# Preprocess data
X = list(data["text"].apply(preprocess))

X_tokenized = tokenizer(X, padding=True, return_tensors="pt")

num_subsample1=5
X_tokenized_subset1={}
for key in X_tokenized.keys():
    X_tokenized_subset1[key]=X_tokenized[key][:num_subsample1]
    
num_subsample2=100
X_tokenized_subset2={}
for key in X_tokenized.keys():
    X_tokenized_subset2[key]=X_tokenized[key][:num_subsample2]

the output of the model on the first subset’s first entry:

display_index=0
outputs1 = model(**X_tokenized_subset1)
outputs1[0][display_index]

gives:

tensor([-1.6196994781,  3.0899136066, -1.3701400757],
       grad_fn=<SelectBackward0>)

while the output of the model on the second subset’s first entry (same entry effectively) is:

outputs2 = model(**X_tokenized_subset2)
outputs2[0][display_index]

gives:

tensor([-1.6196994781,  3.0899133682, -1.3701400757],
       grad_fn=<SelectBackward0>)

Although they should be the same, there’s a difference in the second prediction:

outputs2[0][display_index]-outputs1[0][display_index]

which gives:

tensor([ 0.0000000000e+00, -2.3841857910e-07,  0.0000000000e+00],
       grad_fn=<SubBackward0>)

Any insights? Thanks :slight_smile: