Hey guys, I’m following the steps described here: https://huggingface.co/transformers/custom_datasets.html#token-classification-with-w-nut-emerging-entities and fine-tune with native tensorflow ( see https://huggingface.co/transformers/custom_datasets.html#ft-native) .
During training I’d like to add a metric to see how the model is performing. Currently I have added SparseCategoricalAccuracy but it gets stuck at an accuracy of around 0.2. I can have the Distilbertlayer trainable True or False, it doesn’t make a difference.
What can I do now? What metric would you suggest and what performance can I expect?
Also, do you recommend to set all layer to trainable or only the final Dense layer?
I’d like to add a question
After playing around a little it turns out that the model is predicting the majority class “O” all the time. Any best practice to handle class imbalance? I removed the samples with only few occurences of entities from the dataset but is not enough the solve the problem.
I also tried to tackle the imbalance with setting some of the “O” tags to -100 to achieve a more balanced dataset. Similar to what’s going on with the sub-tokens. This helps a bit but performance is still low.
Additionally, after mapping all entities to a single class I called “Entity” I saw F1 scores around 70%. Though the task is very simplified now.
Would be interesting to know if such a (poor) performance is expected.