Unable to load saved fine tuned tensorflow model

Hello,
It’s been two weeks I have been working with hugging face. I have followed some of the instructions here and some other tutorials in order to finetune a text classification task. Since I am more familiar with tensorflow, I prefered to work with TFAutoModelForSequenceClassification.

First, I trained it with nothing but changing the output layer on the dataset I am using. The dataset was divided in train, valid and test. When training was finished I checked performance on the test dataset achieving an accuracy around 70%. Then I proceeded to save the model and load it in another notebook to repeat the testing with the same dataset. Accuracy dropped to below 0.1. My guess is that the fine tuned weights are not being loaded.

I am struggling a couple of weeks trying to find what I am doing wrong on saving and loading the fine tuned model. I am starting to think that Huggingface has low support to tensorflow and that pytorch is recommended. In fact, tomorrow I will be trying to work with PT.

When Loading using AutoModelForSequenceClassification, it seems that model is correctly loaded and also the weights because of the legend that appears (“All TF 2.0 model weights were used when initializing DistilBertForSequenceClassification. All the weights of DistilBertForSequenceClassification were initialized from the TF 2.0 model. If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertForSequenceClassification for predictions without further training.”) which is different from: “Some layers from the model checkpoint at ./models/robospretrained1000/ were not used when initializing TFDistilBertForSequenceClassification: [‘dropout_39’]…”

The problem with AutoModel is that it has no Tensorflow functions like compile and predict, therefore I am unable to make predictions on the test dataset. Moreover cannot try it with new data

I think that it should work and repeat the performace obtained during training. Here I add the basic steps I am doing

  1. loading dataset (btw: the classnames are not loaded)
traincsv = os.path.join(dir_root, 'data/interim/trainsethugf.csv')
testcsv = os.path.join(dir_root, 'data/interim/testsethugf.csv')
validcsv = os.path.join(dir_root, 'data/interim/validsethugf.csv')
class_names = list(labels_set)
robo_features = Features({'relato': Value('string'), 'labels': ClassLabel(names=class_names)})
dataset = load_dataset("csv", data_files={'train': traincsv, 'test': testcsv, 'validation':validcsv}, features=robo_features)
  1. Tokenizing dataset
from transformers import DistilBertTokenizer, AutoTokenizer, DistilBertTokenizerFast
model_name = 'distilbert-base-multilingual-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
def tokenizer_func(examples):
  return tokenizer(examples["relato"],
                   max_length=seqlen.max(),
                   padding = "max_length",
                   truncation=True)

tokenized_dataset = dataset.map(tokenizer_func, batched=True)
  1. Due to hardware limitations I reduce the dataset. I was able to train with more data using tf_train_set = tokenized_dataset[“train”].shuffle(seed=42).select(range(20000)).to_tf_dataset(…) but I am having a hard time understanding how transformers are working with multicategorical data since the labels are numberd from 0 to N, while I would expect to find one-hot vectors. Because of that reason I thought my saved model was not working. Besides using the approach recommended in the section about fine tuninig the model does not allow to use categorical crossentropy from tensorflow
train_samples = tokenized_dataset["train"].shuffle(seed=42).select(range(1000)) 
valid_samples = tokenized_dataset["validation"].shuffle(seed=42).select(range(200)) 
test_samples = tokenized_dataset["test"].shuffle(seed=42).select(range(200)) 
train_features = train_samples.remove_columns(["relato", "labels"]).with_format("tensorflow") 
val_features = valid_samples.remove_columns(["relato", "labels"]).with_format("tensorflow") 
test_features = test_samples.remove_columns(["relato", "labels"]).with_format("tensorflow") 
train_features = {x: train_features[x] for x in tokenizer.model_input_names}
val_features = {x: val_features[x] for x in tokenizer.model_input_names}
test_features = {x: test_features[x] for x in tokenizer.model_input_names}

  1. Generating Labels
from tensorflow.keras.utils import to_categorical

train_labels = to_categorical(train_samples["labels"])  
val_labels = to_categorical(valid_samples["labels"]) 
test_labels = to_categorical(test_samples["labels"]) 

  1. Model is created as follows:
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=6)
model.layers[0].trainable = False
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5)
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)
metrics=tf.keras.metrics.CategoricalAccuracy()
model.compile(optimizer=optimizer, metrics = metrics, loss=loss)
  1. Training is executed as follows
history = model.fit(train_dataset_tf1, validation_data=val_dataset_tf1, epochs=num_epochs, callbacks = [callback], batch_size=batch_size)
  1. Model testing with micro avg of 0.68 f1 score:
y_true = test_labels
y_hat = model.predict(test_features)
y_hat = tf.nn.softmax(y_hat.logits)
y_hat = tf.argmax(y_hat, axis=1).numpy()
y_hat = tf.one_hot(y_hat, depth=6, dtype=tf.float32)
print(classification_report(y_true=y_true, y_pred=y_hat))
  1. Saving the model: I tried lots of things model.save_pretrained, model.save_weights, model.save, and nothing has worked when loading the model
model.save_pretrained(os.path.join(dir_root, 'models/robospretrained1000'), saved_model=True)
  1. Loading the model:
from transformers import TFAutoModelForSequenceClassification, TFDistilBertForSequenceClassification, TFAutoModel, AutoModelForSequenceClassification, AutoConfig
config = AutoConfig.from_pretrained('./models/robospretrained1000/config.json')
model = TFAutoModelForSequenceClassification.from_pretrained('./models/robospretrained1000/', local_files_only=True, config=config)

It shows a warning that I understand means that weights were not loaded. In fact, I noticed that in the trouble shooting page of HuggingFace you dedicate a section about tensorflow loading. This is making me think that there is no good compatibility with TF.
10 Once I load, I compile the model with same code as in step 5 but I don’t use the freezing step

  1. If I try AutoModel, I am not able to use compile, summary and predict from tensorflow. It is like automodel is being loaded as other thing? Can I convert it? should I think it is working in PT by default?
model = AutoModelForSequenceClassification.from_pretrained('models/robospretrained1000/', from_tf=True, config=config)

Should I think that using native tensorflow is not supported and that I should use Pytorch code or the provided Trainer of HuggingFace?

image

Thanks for your kind help