I’ve often thought about use cases where you think of word or sentence features that you know must be helpful to the system. Features that you would typically use in an SVM or a shallow network. I would want to know if those features still have the ability to add to the performance of a pretrained language model. So rather than just fine-tuning the language model, what are good ways to integrate custom features into LM without pretraining from-scratch?
My guess is that you can just take the output from an LM and add a custom head on top that also takes in these other features. So basically the output of the LM serves as another set of features. This does not seem ideal though, since the final connections might be too shallow, I imagine that a better approach is possible that still involves finetuning the LM along side training the network that the custom features are part of. Any thoughts or best “tried and true” methods out there?