Hi all,
I’m struggling with ensuring reproducible results with the Longformer.
Here is the result of transformer-cli env
:
-
transformers
version: 4.9.1 - Platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 1.8.1+cu102 (True)
- Tensorflow version (GPU?): 2.5.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
I am running a script which finetunes Longformer for sequence classification two times , each time for 4 epochs.
When using the model "allenai/longformer-base-4096"
, I do not get the same training loss in the two iterations.
However, if I use "roberta-base"
as a model, the training loss is identical in both iterations.
I did not find anything else I could add to the script to ensure reproducible results. Could you tell me if I am missing something?
I plotted the training loss over epochs for two consecutive runs with "roberta-base"
and "allenai/longformer-base-4096"
. You can see that the "allenai/longformer-base-4096"
runs show different training loss in the two runs where as the "roberta-base"
runs have identical training loss.
See the plot in a wandb-Report here:
Wandb Report
Below is code to reproduce the results. You can comment/uncomment the respective model_name
to chose either "allenai/longformer-base-4096"
or "roberta-base"
.
import torch
import random
import wandb
import datetime
import numpy as np
from datasets import load_dataset, load_metric
from transformers import AutoTokenizer, AutoConfig, TrainingArguments, Trainer, AutoModelForSequenceClassification
import transformers
transformers.logging.set_verbosity_error()
seed = 42
# python RNG
random.seed(seed)
# pytorch RNGs
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)
# numpy RNG
np.random.seed(seed)
#model_name = "roberta-base"
model_name = "allenai/longformer-base-4096"
raw_datasets = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained(model_name)
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
def get_model():
# get_model is used for the model_init argument for trainer. This should ensures reproducibility. Otherwise, weights from classification head are randomly initialized.
# see https://discuss.huggingface.co/t/fixing-the-random-seed-in-the-trainer-does-not-produce-the-same-results-across-runs/3442
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
config = AutoConfig.from_pretrained(model_name, num_labels = 2),
)
return model
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
lr = 1e-5
num_epochs = 4
batch_size = 2
model_path = "models/" + model_name.replace("/", "_")
for i in range(2):
run = wandb.init(
reinit=True,
name = "transformers_" + model_name + "_" + datetime.datetime.now().strftime("%Y%m%d_%H%M%S"),
notes = "reproducibility training with imdb dataset",
save_code = True,
config = {
"model":model_name,
"learning_rate":lr,
"num_epochs": num_epochs,
"warmup_ratio":0.1,
"batch_size":batch_size,
"random_seed":seed
}
)
training_args = TrainingArguments(
seed = seed,
do_train=True,
do_eval=True,
evaluation_strategy="epoch",
logging_strategy="epoch",
num_train_epochs = num_epochs,
learning_rate=lr,
per_device_train_batch_size = batch_size,
per_device_eval_batch_size = batch_size,
output_dir = "./test_output"
)
trainer = Trainer(
model_init=get_model,
args=training_args,
train_dataset=small_train_dataset,
eval_dataset=small_eval_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
run.finish()