Hmm, have you looked at spacy-transformers? That might be a good fit for your project… Here’s also a paper I read. They tried fine-tuning BERT on the task of predicting the meaning of the ambiguous word.
That might be a bit too in-depth for what your supervisor wanted, though!
The only problem I’ve with all this is that, For my WSD in IR, I already have an existing unsupervised process (configured with W2V), the goal is just to see the impact of other models (especially BERT, as it is supposed to produce best result than W2V) of word embedding.
And to clarify, The process (with W2V) has already been validated with the produced results. So to use BERT I can only adapt it to that process, by embedding context words.
I think it can be useful for WSD. But, as I described, I don’t want to embed sentences but words. So using it will force me to change my approach, which I can’t because the idea here is to compare the impact of bert and W2V on this approach
Yes, that’s true and thank you.
But what I’d like is how to train BERT on a new dataset, not to use the already pre-train BERT.
And let me notice that, I have data annotated neither for NSP nor for MLM.
It why I’m asking if it’s possible and if yes, how to do it?
I think there might be a bit of confusion about what BERT is. BERT was trained using MLM and next sentence prediction. You can fine-tune using MLM alone for simplicity’s sake. Once you have finished fine-tuning, all you have to do is grab the embeddings from the model before it’s passed into the MLM head. You can do this by specifying output_hidden_states=True when calling the model.