The model trained in PyTorch produces inconsistent predictions for the same image when processed individually versus in a batch

I am noticing a significant difference in model predictions when running predictions on a single image versus the whole dataset . The model, which was trained using PyTorch , gives drastically different predictions for the same image when processed individually versus in a batch. Is there any way to ensure that the predictions are consistent for the same image when processed individually and in a batch?

1 Like

This could be because of many reasons. You’ll need to understand for yourself which ones it could be. Some possible options:

  1. Model is in train mode and not eval mode. This causes dropout layers to be active. This also causes BatchNorm to keep updating the mean and variance during the forward pass.
  2. You have unequal size images in the batch and you pad to largest size with border black/white and depending on where the actual image content ends up, the results are different. Or maybe you resize to largest image in the batch.
1 Like

Below is the code, please take a look and see if anything needs to be modified to have results same for a particular image.

from transformers import Trainer, TrainingArguments, PreTrainedModel, PretrainedConfig
from torch.utils.data import Dataset
import torch
import torch.nn.functional as F
import numpy as np

Number of Features

num_of_features = 128

Dataset Class

class SequenceDataset(Dataset):
def init(self, X, y):
self.X = torch.tensor(X, dtype=torch.float32)
self.y = torch.tensor(y, dtype=torch.long)

def __len__(self):
    return len(self.y)

def __getitem__(self, idx):
    return {"input_ids": self.X[idx], "labels": self.y[idx]}

Configuration Class

class SequenceConfig(PretrainedConfig):
model_type = “sequence_transformer”

def __init__(self, num_features=num_of_features, num_classes=3, d_model=1024, nhead=4, num_layers=4, dim_feedforward=512, **kwargs):
    self.num_features = num_features
    self.num_classes = num_classes
    self.d_model = d_model
    self.nhead = nhead
    self.num_layers = num_layers
    self.dim_feedforward = dim_feedforward
    super().__init__(**kwargs)

Transformer Model

class SequenceTransformer(PreTrainedModel):
config_class = SequenceConfig

def __init__(self, config):
    super().__init__(config)
    self.embedding = torch.nn.Linear(config.num_features, config.d_model)
    self.positional_encoding = torch.nn.Parameter(torch.zeros(1, config.d_model))
    encoder_layer = torch.nn.TransformerEncoderLayer(
        d_model=config.d_model, 
        nhead=config.nhead, 
        dim_feedforward=config.dim_feedforward, 
        batch_first=True
    )
    self.transformer_encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=config.num_layers)
    self.fc = torch.nn.Linear(config.d_model, config.num_classes)

def forward(self, input_ids, labels=None):
    src = self.embedding(input_ids) + self.positional_encoding
    output = self.transformer_encoder(src)
    logits = self.fc(output)
    probs = F.softmax(logits, dim=-1)

    loss = None
    if labels is not None:
        loss_fct = torch.nn.CrossEntropyLoss()
        loss = loss_fct(logits, labels)
        
    return {"loss": loss, "logits": logits, "probs": probs} if labels is not None else logits

Training Code

config = SequenceConfig()
model = SequenceTransformer(config)

Training Arguments

batchSize=32
numWarmUpSteps=int(np.shape(train_image)[0]/batchSize/numOfBreakpointsPerEpoch/10)
training_args = TrainingArguments(
    output_dir=path,
    num_train_epochs=1, 
    per_device_train_batch_size=batchSize,
    per_device_eval_batch_size=320,
    warmup_steps=numWarmUpSteps,
    weight_decay=0.1,
    logging_strategy='no',
    eval_strategy="epoch",
    save_strategy="epoch",
    metric_for_best_model="accuracy",
    save_only_model=True,
)

Trainer Initialization

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics
)

Train the Model

train_output = trainer.train()

Save Model and Training Arguments

trainer.save_model(“./SavedModels”)
torch.save(training_args, “./SavedModels/training_args.bin”)

Prediction Code

training_args_loaded = torch.load(“./SavedModels/training_args.bin”)
model_save_path = “./SavedModels/”
model = SequenceTransformer(config).from_pretrained(model_save_path)

trainer = Trainer(model=model, compute_metrics=compute_metrics, args=training_args_loaded)
test_data = np.random.rand(10, num_of_features) # Example test data
test_predictions = trainer.predict(torch.tensor(test_data, dtype=torch.float32))

Output Test Predictions

print(test_predictions)

1 Like

For the first point, the model is indeed in evaluation mode, so dropout layers are inactive, and BatchNorm is not updating its mean and variance during the forward pass.

For the second point, we’ve also tested this by keeping the same 10 images in a batch and comparing the output against processing a single image at a time. Despite this, the outputs differ, which rules out issues related to padding or resizing inconsistencies.

1 Like

Maybe I’m mis-reading it, but it seems like the model is expecting input ids, but is receiving float32 tensor as input. That seems like a problem.

Some other things I noticed:

  1. positional_encoding is a 1 x d_model tensor, which means the same value will be added to every image patch. Change it to num_image_patches x d_model instead.
  2. You don’t need an embedding layer for image models. The image path is the embedding for that patch.
2 Likes