Once I used BertForSequenceClassification instead of BertModel + Linear, I was able to push the model to hub.