How can I enforce reproducibility for Longformer?

DavidPfl · July 30, 2021, 8:49pm

Hi all,

I’m struggling with ensuring reproducible results with the Longformer.

Here is the result of transformer-cli env:

transformers version: 4.9.1
Platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.8.1+cu102 (True)
Tensorflow version (GPU?): 2.5.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

I am running a script which finetunes Longformer for sequence classification two times , each time for 4 epochs.

When using the model "allenai/longformer-base-4096", I do not get the same training loss in the two iterations.
However, if I use "roberta-base" as a model, the training loss is identical in both iterations.
I did not find anything else I could add to the script to ensure reproducible results. Could you tell me if I am missing something?

I plotted the training loss over epochs for two consecutive runs with "roberta-base" and "allenai/longformer-base-4096". You can see that the "allenai/longformer-base-4096" runs show different training loss in the two runs where as the "roberta-base" runs have identical training loss.
See the plot in a wandb-Report here:
Wandb Report

Below is code to reproduce the results. You can comment/uncomment the respective model_name to chose either "allenai/longformer-base-4096" or "roberta-base".

import torch
import random
import wandb
import datetime
import numpy as np
from datasets import load_dataset, load_metric
from transformers import AutoTokenizer, AutoConfig, TrainingArguments, Trainer, AutoModelForSequenceClassification
import transformers

transformers.logging.set_verbosity_error()

seed = 42
# python RNG
random.seed(seed)

# pytorch RNGs
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)

# numpy RNG
np.random.seed(seed)

#model_name = "roberta-base"
model_name = "allenai/longformer-base-4096"

raw_datasets = load_dataset("imdb")

tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

def get_model():
    # get_model is used for the model_init argument for trainer. This should ensures reproducibility. Otherwise, weights from classification head are randomly initialized.
    # see https://discuss.huggingface.co/t/fixing-the-random-seed-in-the-trainer-does-not-produce-the-same-results-across-runs/3442
    model =  AutoModelForSequenceClassification.from_pretrained(
        model_name,
        config = AutoConfig.from_pretrained(model_name, num_labels = 2),
    )
    return model

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

lr = 1e-5
num_epochs = 4
batch_size = 2
model_path = "models/" + model_name.replace("/", "_")

for i in range(2):
	run = wandb.init(
	reinit=True,
	name = "transformers_" + model_name + "_" + datetime.datetime.now().strftime("%Y%m%d_%H%M%S"),
	notes = "reproducibility training with imdb dataset",
	save_code = True,
	config = {
		"model":model_name,
		"learning_rate":lr,
		"num_epochs": num_epochs,
		"warmup_ratio":0.1,
		"batch_size":batch_size,
		"random_seed":seed
		}
	)

	training_args = TrainingArguments(
		seed = seed,
		do_train=True,
		do_eval=True,
		evaluation_strategy="epoch",
		logging_strategy="epoch",
		num_train_epochs = num_epochs,
		learning_rate=lr, 
		per_device_train_batch_size = batch_size, 
		per_device_eval_batch_size = batch_size,
		output_dir = "./test_output"
	)

	trainer = Trainer(
		model_init=get_model,
		args=training_args,
		train_dataset=small_train_dataset,
		eval_dataset=small_eval_dataset,
		compute_metrics=compute_metrics,
	)
	trainer.train()
	run.finish()

sgrvinod · January 5, 2022, 12:26pm

Hi @DavidPfl, were you able to figure this out?

menevsem · January 30, 2023, 9:59pm

Hello, do you have any progress about the issue? I am facing the same problem with long former.

johngiorgi · March 30, 2023, 4:48pm

Facing the same issue with allenai/led-large-16384 via the run_summarization.py script. Was anyone able to get reproducible results with these models?

Topic		Replies	Views
Reproducibility of LongFormer Model 🤗Transformers	0	15	August 30, 2024
Longformer for sequenceclassification 🤗Transformers	5	482	October 13, 2020
Convert models to Longformer Intermediate	3	2201	February 1, 2021
Strange error when using the Longformer (HuggingFace developers, please reply) 🤗Transformers	8	1804	October 12, 2020
Self-made Longformer doesn't take more than 512 token 🤗Transformers	0	464	January 5, 2022

How can I enforce reproducibility for Longformer?

Related topics