I am noticing a significant difference in model predictions when running predictions on a single image versus the whole dataset . The model, which was trained using PyTorch , gives drastically different predictions for the same image when processed individually versus in a batch. Is there any way to ensure that the predictions are consistent for the same image when processed individually and in a batch?
This could be because of many reasons. You’ll need to understand for yourself which ones it could be. Some possible options:
- Model is in train mode and not eval mode. This causes dropout layers to be active. This also causes BatchNorm to keep updating the mean and variance during the forward pass.
- You have unequal size images in the batch and you pad to largest size with border black/white and depending on where the actual image content ends up, the results are different. Or maybe you resize to largest image in the batch.
Below is the code, please take a look and see if anything needs to be modified to have results same for a particular image.
from transformers import Trainer, TrainingArguments, PreTrainedModel, PretrainedConfig
from torch.utils.data import Dataset
import torch
import torch.nn.functional as F
import numpy as np
Number of Features
num_of_features = 128
Dataset Class
class SequenceDataset(Dataset):
def init(self, X, y):
self.X = torch.tensor(X, dtype=torch.float32)
self.y = torch.tensor(y, dtype=torch.long)
def __len__(self):
return len(self.y)
def __getitem__(self, idx):
return {"input_ids": self.X[idx], "labels": self.y[idx]}
Configuration Class
class SequenceConfig(PretrainedConfig):
model_type = “sequence_transformer”
def __init__(self, num_features=num_of_features, num_classes=3, d_model=1024, nhead=4, num_layers=4, dim_feedforward=512, **kwargs):
self.num_features = num_features
self.num_classes = num_classes
self.d_model = d_model
self.nhead = nhead
self.num_layers = num_layers
self.dim_feedforward = dim_feedforward
super().__init__(**kwargs)
Transformer Model
class SequenceTransformer(PreTrainedModel):
config_class = SequenceConfig
def __init__(self, config):
super().__init__(config)
self.embedding = torch.nn.Linear(config.num_features, config.d_model)
self.positional_encoding = torch.nn.Parameter(torch.zeros(1, config.d_model))
encoder_layer = torch.nn.TransformerEncoderLayer(
d_model=config.d_model,
nhead=config.nhead,
dim_feedforward=config.dim_feedforward,
batch_first=True
)
self.transformer_encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=config.num_layers)
self.fc = torch.nn.Linear(config.d_model, config.num_classes)
def forward(self, input_ids, labels=None):
src = self.embedding(input_ids) + self.positional_encoding
output = self.transformer_encoder(src)
logits = self.fc(output)
probs = F.softmax(logits, dim=-1)
loss = None
if labels is not None:
loss_fct = torch.nn.CrossEntropyLoss()
loss = loss_fct(logits, labels)
return {"loss": loss, "logits": logits, "probs": probs} if labels is not None else logits
Training Code
config = SequenceConfig()
model = SequenceTransformer(config)
Training Arguments
batchSize=32
numWarmUpSteps=int(np.shape(train_image)[0]/batchSize/numOfBreakpointsPerEpoch/10)
training_args = TrainingArguments(
output_dir=path,
num_train_epochs=1,
per_device_train_batch_size=batchSize,
per_device_eval_batch_size=320,
warmup_steps=numWarmUpSteps,
weight_decay=0.1,
logging_strategy='no',
eval_strategy="epoch",
save_strategy="epoch",
metric_for_best_model="accuracy",
save_only_model=True,
)
Trainer Initialization
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics
)
Train the Model
train_output = trainer.train()
Save Model and Training Arguments
trainer.save_model(“./SavedModels”)
torch.save(training_args, “./SavedModels/training_args.bin”)
Prediction Code
training_args_loaded = torch.load(“./SavedModels/training_args.bin”)
model_save_path = “./SavedModels/”
model = SequenceTransformer(config).from_pretrained(model_save_path)
trainer = Trainer(model=model, compute_metrics=compute_metrics, args=training_args_loaded)
test_data = np.random.rand(10, num_of_features) # Example test data
test_predictions = trainer.predict(torch.tensor(test_data, dtype=torch.float32))
Output Test Predictions
print(test_predictions)
For the first point, the model is indeed in evaluation mode, so dropout layers are inactive, and BatchNorm is not updating its mean and variance during the forward pass.
For the second point, we’ve also tested this by keeping the same 10 images in a batch and comparing the output against processing a single image at a time. Despite this, the outputs differ, which rules out issues related to padding or resizing inconsistencies.
Maybe I’m mis-reading it, but it seems like the model is expecting input ids, but is receiving float32 tensor as input. That seems like a problem.
Some other things I noticed:
positional_encoding
is a1 x d_model
tensor, which means the same value will be added to every image patch. Change it tonum_image_patches x d_model
instead.- You don’t need an
embedding
layer for image models. The image path is the embedding for that patch.