I fine-tuned my model. How can I evaluate accuracy? How can I take test-set metrics and classify a sentence?

This is my code with my WF pandas dataframe data.

X_train, X_test, y_train, y_test = train_test_split(list(WF["text"]), list(WF["relevance"]), test_size=0.2, random_state=42)
train_df = pd.DataFrame({"label":y_train,"text":X_train})
test_df = pd.DataFrame({"label": y_test,"text":X_test})
train_dataset = Dataset.from_dict(train_df)
test_dataset = Dataset.from_dict(test_df)
dataset = DatasetDict({"train":train_dataset,"test":test_dataset})
training_data = dataset['train']

tokenizer = AutoTokenizer.from_pretrained(tf_pre_model)
tokenized_data = tokenizer(training_data["text"], return_tensors="np", padding=True)
labels = np.array(training_data["label"])

model = TFAutoModelForSequenceClassification.from_pretrained(tf_pre_model)
model.fit(dict(tokenized_data), labels)

As you can see I took the pretrained model (“bert base cased” in this case) and fine-tuned it in my dataframe as described in the huggingface tutotrial(using thensorflowkeras). However the tutorial never said how do I calculate accuracy on the training set(results so far: 161/161 [==============================] - 2374s 15s/step - loss: 0.1242).

Also, how do I evaluate my test set and how do I make classifier predictions?
Last, can I save the model for future use after training once?

Thank you so much.