TFLongformer Error : Trying to create optimizer slot variable under the scope for tf.distribute.Strategy

pfrodedelaforet · February 2, 2021, 9:49pm

Hello everyone,
I am facing an issue that I have been trying to solve for 1 week now. I try to train a tensorflow longformer but I have the following error :

Traceback (most recent call last):
File “/home/pfrod/architectures/prosenet.py”, line 106, in
trainer.train()
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/trainer_tf.py”, line 549, in train
self.distributed_training_steps(batch)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 828, in call
result = self._call(*args, **kwds)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 726, in _initialize
*args, **kwds))
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py”, line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py”, line 634, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/eager/function.py”, line 3887, in bound_method_wrapper
return wrapped_fn(*args, **kwargs)
File “/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py”, line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/trainer_tf.py:671 distributed_training_steps *
self.args.strategy.run(self.apply_gradients, inputs)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/trainer_tf.py:662 apply_gradients *
self.optimizer.apply_gradients(list(zip(gradients, self.model.trainable_variables)))
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/transformers/optimization_tf.py:232 apply_gradients *
return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:604 apply_gradients **
self._create_all_weights(var_list)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:783 _create_all_weights
self._create_slots(var_list)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/adam.py:127 _create_slots
self.add_slot(var, ‘m’)
/home/pfrod/anaconda3/envs/env_minus/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:844 add_slot
.format(strategy, var))
ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.one_device_strategy.OneDeviceStrategy object at 0x7f0f5c22fd50>), which is different from the scope used for the original variable (<tf.Variable ‘tf_longformer_for_sequence_classification/longformer/embeddings/word_embeddings/weight:0’ shape=(50265, 768) dtype=float32, numpy=
array([[ 0.15307617, -0.03359985, 0.08703613, …, -0.02035522,
0.02037048, -0.00749207],
[ 0.01556396, 0.00740433, -0.01169586, …, -0.00212097,
0.00801086, -0.01560974],
[-0.04318237, -0.08050537, -0.02220154, …, 0.12414551,
-0.01826477, -0.03604126],
…,
[ 0.03164673, 0.04992676, -0.03146362, …, 0.03674316,
0.00679016, 0.01078033],
[ 0.06192017, -0.05645752, 0.02749634, …, -0.0916748 ,
0.10888672, -0.0161438 ],
[ 0.12585449, -0.01345062, 0.03518677, …, 0.01661682,
0.03457642, 0.01670837]], dtype=float32)>). Make sure the slot variables are created under the same strategy scope. This may happen if you’re restoring from a checkpoint outside the scope

When running the following code :

from transformers import TFLongformerForSequenceClassification, LongformerTokenizer, TFTrainer, TFTrainingArguments, LongformerForSequenceClassification, LongformerConfig, TFLongformerModel
import numpy as np
import tensorflow as tf
from tensorflow.data import Dataset
from pathlib import Path
from tqdm import tqdm
from sklearn.model_selection import train_test_split

gpu_act = True
if gpu_act : 
    GPU = tf.config.list_physical_devices('GPU')[0]
    tf.config.experimental.set_virtual_device_configuration(GPU, [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=8192//2)])

tokenizer = LongformerTokenizer.from_pretrained('../storage/tokenizer', max_length = 2048)

model = TFLongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096',

                                                               gradient_checkpointing=True,

                                                               attention_window = 512, return_dict = True)



PATH = Path("../storage/treated_articles")
iterd = PATH.iterdir()
dat = []
labels = []

for label in iterd:
    for article in tqdm(label.iterdir()):
        dat.append(str(article))
        labels.append(str(label)[-17 :] == '/RELEVANT_TREATED')

files_train, files_test, y_train, y_test = train_test_split(dat, labels, test_size = 0.33, shuffle = True)



x_train= {'input_ids' : [None]*len(files_train), 'attention_mask' : [None]*len(files_train)}

for i, file in enumerate(files_train) : 
    tok = tokenizer(open(file, 'r').read().replace('\n\n','. ').replace('..', '.').replace('\n', ''), padding = 'max_length', truncation = True, max_length = 2048, return_tensors = 'tf')
    x_train['input_ids'][i] = tok['input_ids'][0]
    x_train['attention_mask'][i] = tok['attention_mask'][0]

x_test = {'input_ids' : [None]*len(files_test), 'attention_mask' : [None]*len(files_test)}

for i, file in enumerate(files_test) : 
    tok = tokenizer(open(file, 'r').read().replace('\n\n','. ').replace('..', '.').replace('\n', ''), padding = 'max_length', truncation = True, max_length = 2048, return_tensors = 'tf')
    x_test['input_ids'][i] = tok['input_ids'][0]
    x_test['attention_mask'][i] = tok['attention_mask'][0]

x_train['input_ids'] = tf.convert_to_tensor(x_train['input_ids'])
x_train['attention_mask'] = tf.convert_to_tensor(x_train['attention_mask'])
x_test['input_ids'] = tf.convert_to_tensor(x_test['input_ids'])
x_test['attention_mask'] = tf.convert_to_tensor(x_test['attention_mask'])



data_x_train = Dataset.from_tensor_slices(x_t)
data_y_train = Dataset.from_tensor_slices(list(map(int, y_train)))
data_train = Dataset.zip((data_x_train, data_y_train))

data_x_test = Dataset.from_tensor_slices(x_te)
data_y_test = Dataset.from_tensor_slices(list(map(int, y_test)))
data_test = Dataset.zip((data_x_test, data_y_test))

training_args = TFTrainingArguments(
    output_dir = '../results/interpretable_longformer',
    num_train_epochs = 8,
    gradient_accumulation_steps = 8,    
    evaluation_strategy = "epoch",
    disable_tqdm = False, 
    warmup_steps=150,
    weight_decay=0.01,
    logging_steps = 4,
    fp16 = True,
    logging_dir='../results/logging_interpretable_longformer',
    run_name = 'longformer-classification-updated-rtx3090_paper_replication_2_warm', 

)

trainer = TFTrainer(model=model, args=training_args,
                               train_dataset=data_train, eval_dataset=data_test)

trainer.train()

I am not really used to posting my issues, so if I didn’t give enough information about my code, please let me know !

Thanks in advance !

valhalla · February 3, 2021, 2:10pm

pinging @patrickvonplaten , @jplu

jplu · February 3, 2021, 2:27pm

Hello !!

The raised error means that you are not instantiating your model in the proper strategy, which is true accordingly to the code you shared. I suggest you to take a look at the scripts available in the examples folder in the repo.

You can get more detail on why you get this error in the TensorFlow documentation.

Hope this helps.

pfrodedelaforet · February 3, 2021, 2:44pm

Hello ! Thank you two for the answers ! Nonetheless, I implemented TFBert and TFRoberta, instantiating them exactly the same way with their own tokenizer but with same input types, and It went really well. I already went through most of HuggingFace examples. Could you tell me more about the strategy I have to adopt ? Is it proper to longformer ?

By the way @jplu I see your profile picture, I am currently studying in Sophia Antipolis. It’s a small world !

jplu · February 3, 2021, 2:54pm

If you take a look at the examples and the documentation I shared you will see that you have to instantiate your model in the same strategy scope than the trainer will use with. For example in the run_tf_glue.tf file:

with training_args.strategy.scope():
        model = TFAutoModelForSequenceClassification.from_pretrained(
            model_args.model_name_or_path,
            from_pt=bool(".bin" in model_args.model_name_or_path),
            config=config,
            cache_dir=model_args.cache_dir,
        )

If you don’t do that you cannot use the TFTrainer for any model not only Longformer.

It’s a small world indeed

pfrodedelaforet · February 3, 2021, 3:00pm

Thanks a lot !
Pierre

EmreOzan9103 · February 4, 2021, 9:06am

Can you share your working version? Thanks

Topic		Replies	Views
IndexError list out of range Beginners	2	3112	September 21, 2020
TFLongformer Shape Error 🤗Transformers	2	679	December 31, 2021
ImportError: cannot import name 'TFLongformerForMaskedLM' Models	3	2118	November 4, 2020
Error - RuntimeError 🤗Transformers	0	732	December 15, 2023
Using trainer to fine-tune the model gives an error. Seeking solution! Beginners	1	103	December 3, 2024

TFLongformer Error : Trying to create optimizer slot variable under the scope for tf.distribute.Strategy

Related topics