I am currently experimenting with sentence prediction using different bert models. In particular, I have a training corpus of around 4000 (binary) classified tweets. I already fine tuned BERT and BERTweet and my goal now is to use them for predicting a new inflow of tweets.
My main issue is in terms of performance. When loading the stored models and using them to predict a new corpus of tweets (of about the same size as the one I fine tunned the models), I experience extremely slow speed. I haven’t timed it but I can tell that the fine tunning took around 5-10 mins using the tensorflow-metal on a macbook pro M1, and the prediction stage can easily take 2 hours… I am assuming that this has something to do with the way I am storing/loading the models.
Here is a snippet of the code I use to do so (notice that I omit the data preprocessing as it is done in the same fashion as in the Hugging face tutorials) :
model = TFAutoModelForSequenceClassification.from_pretrained(
'bert-base-uncased', num_labels=2)
num_epochs = 3
num_train_steps = len(tf_train_dataset) * num_epochs
lr_scheduler = PolynomialDecay(
initial_learning_rate=5e-5, end_learning_rate=0.0, decay_steps=num_train_steps
)
opt = Adam(learning_rate=lr_scheduler)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
with tf.device('/gpu:0'):
model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
click.secho('Fine tunning the BERTbase model',fg='yellow',bold=True)
model.fit(
tf_train_dataset,
validation_data=tf_test_dataset,
epochs=num_epochs+1,
class_weight=class_weights
)
Now to save it and load it :
model.save_pretrained(bert_name)
model = TFAutoModelForSequenceClassification.from_pretrained(f"{cd_models}/{bert_name}")
tokenizer = AutoTokenizer.from_pretrained(
"bert-base-uncased", padding='max_length', truncation=True)
The predictions go in the same fashion as the training stage ( perhaps I am committing some mistake here) :
click.secho('Tokenizing with bertokenizer',fg='blue',bg='white')
dataset = Dataset.from_pandas(X1[['tidy_tweet']])
def tokenize_function(example):
return tokenizer(example["tidy_tweet"], truncation=True)
tokenized_dataset = dataset.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(
tokenizer=tokenizer, return_tensors="tf")
tf_dataset = tokenized_dataset.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
collate_fn=data_collator,
batch_size=1,
shuffle=False
)
import tensorflow as tf
import numpy as np
# I changed this to cpu to check if it is faster
with tf.device('/gpu:0'):
preds = model.predict(tf_dataset)
Thanks in advance
EDIT: I just tried to do it on the cpu rather than gpu and time is still not extremely fast but it improved a lot. Maybe gpu does not work that well in this case?