Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification?

I am using BERT to do multiclass text classification. The number of output classes I have to predict from is: 116 and there is high degree of class imbalance that I see.
We have the following kind of records available for each of the classes:
{‘Class A’: 975 number of records,
‘Class B’: 776 number of records,
‘Class C’: 533 number of records,
‘Class D’: 412 number of records,
‘Class E’: 302 number of records,
‘Class F’: 250 number of records,
‘Class G’: 207 number of records,
‘Class H’: 137 number of records,
‘Class I’: 96 number of records,
‘Class J’: 51 number of records,
‘Class K’: 28 number of records,
‘Class L’: 17 number of records,
‘Class M’: 7 number of records,
‘Class N’: 2 number of records}

So I have two questions here:
Question1: As we have around 116 output classes to predict from, does that affect the performance of BERT due to the high number of output classes?

Question2: My original data has the similar type of class distribution that I have illustrated above. So how does this affect the performance of BERT and if it affects how do we handle this to get proper output?

Looking forward to get answer from the talented community we have here.

Much thanks in advance.