Reduce output dimensions of BERT

BERT was pre-trained with the 768-dimensional output, so if you use a pre-trained model, the final layer will have that dimensionality. However, you can always take the output logits and pass them through another linear layer that will map 768 dimensions to a custom dimension.