Important note: @nielsr’s approach is definitely reasonable but I would argue that you should use a separate optimizer for the POS embeddings when you finetune. The reason being that the main model (+ embeddings) are already pretrained, whereas the POS embeddings are not. You’d likely need a larger lr for those new embeddings.
An alternative approach is adding layers on top of the model which concatenate POS features, e.g. one-hot encoded, and pass it to an RNN for instance.