TFLongformer Error : Trying to create optimizer slot variable under the scope for tf.distribute.Strategy

Hello everyone,
I am facing an issue that I have been trying to solve for 1 week now. I try to train a tensorflow longformer but I have the following error :

Traceback (most recent call last):
File “/home/pfrod/architectures/prosenet.py”, line 106, in
trainer.train()
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/trainer_tf.py”, line 549, in train
self.distributed_training_steps(batch)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 828, in call
result = self._call(*args, **kwds)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 726, in _initialize
*args, **kwds))
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py”, line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 634, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 3887, in bound_method_wrapper
return wrapped_fn(*args, **kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py”, line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/trainer_tf.py:671 distributed_training_steps *
self.args.strategy.run(self.apply_gradients, inputs)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/trainer_tf.py:662 apply_gradients *
self.optimizer.apply_gradients(list(zip(gradients, self.model.trainable_variables)))
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/optimization_tf.py:232 apply_gradients *
return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:604 apply_gradients **
self._create_all_weights(var_list)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:783 _create_all_weights
self._create_slots(var_list)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/adam.py:127 _create_slots
self.add_slot(var, ‘m’)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:844 add_slot
.format(strategy, var))
ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.one_device_strategy.OneDeviceStrategy object at 0x7f0f5c22fd50>), which is different from the scope used for the original variable (<tf.Variable ‘tf_longformer_for_sequence_classification/longformer/embeddings/word_embeddings/weight:0’ shape=(50265, 768) dtype=float32, numpy=
array([[ 0.15307617, -0.03359985, 0.08703613, …, -0.02035522,
0.02037048, -0.00749207],
[ 0.01556396, 0.00740433, -0.01169586, …, -0.00212097,
0.00801086, -0.01560974],
[-0.04318237, -0.08050537, -0.02220154, …, 0.12414551,
-0.01826477, -0.03604126],
…,
[ 0.03164673, 0.04992676, -0.03146362, …, 0.03674316,
0.00679016, 0.01078033],
[ 0.06192017, -0.05645752, 0.02749634, …, -0.0916748 ,
0.10888672, -0.0161438 ],
[ 0.12585449, -0.01345062, 0.03518677, …, 0.01661682,
0.03457642, 0.01670837]], dtype=float32)>). Make sure the slot variables are created under the same strategy scope. This may happen if you’re restoring from a checkpoint outside the scope

When running the following code :

from transformers import TFLongformerForSequenceClassification, LongformerTokenizer, TFTrainer, TFTrainingArguments, LongformerForSequenceClassification, LongformerConfig, TFLongformerModel
import numpy as np
import tensorflow as tf
from tensorflow.data import Dataset
from pathlib import Path
from tqdm import tqdm
from sklearn.model_selection import train_test_split

gpu_act = True
if gpu_act : 
    GPU = tf.config.list_physical_devices('GPU')[0]
    tf.config.experimental.set_virtual_device_configuration(GPU, [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=8192//2)])

tokenizer = LongformerTokenizer.from_pretrained('../storage/tokenizer', max_length = 2048)

model = TFLongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096',

                                                               gradient_checkpointing=True,

                                                               attention_window = 512, return_dict = True)



PATH = Path("../storage/treated_articles")
iterd = PATH.iterdir()
dat = []
labels = []

for label in iterd:
    for article in tqdm(label.iterdir()):
        dat.append(str(article))
        labels.append(str(label)[-17 :] == '/RELEVANT_TREATED')

files_train, files_test, y_train, y_test = train_test_split(dat, labels, test_size = 0.33, shuffle = True)



x_train= {'input_ids' : [None]*len(files_train), 'attention_mask' : [None]*len(files_train)}

for i, file in enumerate(files_train) : 
    tok = tokenizer(open(file, 'r').read().replace('\n\n','. ').replace('..', '.').replace('\n', ''), padding = 'max_length', truncation = True, max_length = 2048, return_tensors = 'tf')
    x_train['input_ids'][i] = tok['input_ids'][0]
    x_train['attention_mask'][i] = tok['attention_mask'][0]

x_test = {'input_ids' : [None]*len(files_test), 'attention_mask' : [None]*len(files_test)}

for i, file in enumerate(files_test) : 
    tok = tokenizer(open(file, 'r').read().replace('\n\n','. ').replace('..', '.').replace('\n', ''), padding = 'max_length', truncation = True, max_length = 2048, return_tensors = 'tf')
    x_test['input_ids'][i] = tok['input_ids'][0]
    x_test['attention_mask'][i] = tok['attention_mask'][0]

x_train['input_ids'] = tf.convert_to_tensor(x_train['input_ids'])
x_train['attention_mask'] = tf.convert_to_tensor(x_train['attention_mask'])
x_test['input_ids'] = tf.convert_to_tensor(x_test['input_ids'])
x_test['attention_mask'] = tf.convert_to_tensor(x_test['attention_mask'])



data_x_train = Dataset.from_tensor_slices(x_t)
data_y_train = Dataset.from_tensor_slices(list(map(int, y_train)))
data_train = Dataset.zip((data_x_train, data_y_train))

data_x_test = Dataset.from_tensor_slices(x_te)
data_y_test = Dataset.from_tensor_slices(list(map(int, y_test)))
data_test = Dataset.zip((data_x_test, data_y_test))

training_args = TFTrainingArguments(
    output_dir = '../results/interpretable_longformer',
    num_train_epochs = 8,
    gradient_accumulation_steps = 8,    
    evaluation_strategy = "epoch",
    disable_tqdm = False, 
    warmup_steps=150,
    weight_decay=0.01,
    logging_steps = 4,
    fp16 = True,
    logging_dir='../results/logging_interpretable_longformer',
    run_name = 'longformer-classification-updated-rtx3090_paper_replication_2_warm', 

)

trainer = TFTrainer(model=model, args=training_args,
                               train_dataset=data_train, eval_dataset=data_test)

trainer.train()

I am not really used to posting my issues, so if I didn’t give enough information about my code, please let me know ! :sweat_smile:

Thanks in advance !

pinging @patrickvonplaten , @jplu

Hello !!

The raised error means that you are not instantiating your model in the proper strategy, which is true accordingly to the code you shared. I suggest you to take a look at the scripts available in the examples folder in the repo.

You can get more detail on why you get this error in the TensorFlow documentation.

Hope this helps.

Hello ! Thank you two for the answers ! Nonetheless, I implemented TFBert and TFRoberta, instantiating them exactly the same way with their own tokenizer but with same input types, and It went really well. I already went through most of HuggingFace examples. Could you tell me more about the strategy I have to adopt ? Is it proper to longformer ?

By the way @jplu I see your profile picture, I am currently studying in Sophia Antipolis. It’s a small world !

If you take a look at the examples and the documentation I shared you will see that you have to instantiate your model in the same strategy scope than the trainer will use with. For example in the run_tf_glue.tf file:

with training_args.strategy.scope():
        model = TFAutoModelForSequenceClassification.from_pretrained(
            model_args.model_name_or_path,
            from_pt=bool(".bin" in model_args.model_name_or_path),
            config=config,
            cache_dir=model_args.cache_dir,
        )

If you don’t do that you cannot use the TFTrainer for any model not only Longformer.

It’s a small world indeed :slight_smile:

Thanks a lot !
Pierre

1 Like

Can you share your working version? Thanks