Hi all,
I tend to finetune BERT with a simple text classification task.
However, I got different results when using the huggingface library (torch 1.8.1+cu111) and google’s official code (Tensorflow v1.15).
I wonder if there is any optimization in huggingface for fine-tuning bert?
By the way, I believe that I use the same hyper-parameters. But I got the higher performance using the huggingface library.
I have checked the details in a similar issue:
However, it does not help me.