Hi folks,
I am working on a classification model using GPT2 with GPT2ForSequenceClassification. But I am facing a problem with reproducibility.
def seed_worker(worker_id):
worker_seed = torch.initial_seed() % 2**32
numpy.random.seed(worker_seed)
random.seed(worker_seed)
To get reproducability I have used torch.manual_seed() to seed the RNG for all devices (both CPU and CUDA)
torch.manual_seed(110)
To get reproducability remove dataloader#s randomness
g = torch.Generator()
g.manual_seed(110)
Move pytorch dataset into dataloader.
train_dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True ,collate_fn=gpt2_classificaiton_collator,num_workers=0,worker_init_fn=seed_worker,generator=g)
Create pytorch dataset.
valid_dataset = MovieReviewsDataset(train_dataset,train=False, use_tokenizer=tokenizer)
Move pytorch dataset into dataloader.
valid_dataloader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=True, collate_fn=gpt2_classificaiton_collator,num_workers=0,worker_init_fn=seed_worker,generator=g)
I also did set_seed(110). But my code always provides me a new accuracy score for each training cycle. please help