I am implementing the steps provided by the book “Natural Language Processing with Transformers” (by Lewis Tunstall, Leandro von Werra and Thomas Wolf) for text classification; so, I am referring to the chapter 2, “Text classification”.
The fine-tuning I chose to implement was the one suggested at p.50 (Fine-Tuning with Keras).
The first thing I had to do in order for the model to work was to provide a collate_fn argument to the to_tf_dataset() function, because Keras, apparently, has changed the required arguments since the publishing of the original version book.
The code I used was the following (I followed almost the same steps presented by the book):
from transformers import TFAutoModelForSequenceClassification
tf_model=(TFAutoModelForSequenceClassification.from_pretrained(model_cpkt,num_labels=num_labels)) #p.46 #of the book
from transformers import DataCollatorWithPadding # Implemented by myself given that the collate_fn #argument is required
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors=“tf”)
The model was trained and returned a good accuracy (above 0,90). Now, my question is the following: what is the syntax for predicting the classes of new texts, for example for the tf_eval_dataset PrefetchDataset element ? I tried many configurations for the input tensor, but all failed. I always used the tf_model.predict() to try to predict new elements.
Second question: what would be the code to predict a given sentence already encoded into a variable ? Will I still have to use the Dataset functions in order to properly encode a given string ?
Thank you !