I am working on fine-tuning a camembert model for a text classification task.
So here is what I am doing :
I am loading a TFCamembertModel from pretrained base architecture.
I call a function that builds the model as follow :
The layer in the model is basically the transformer outputting the CLS token used for classification.
My question is the following: how do I save the whole CamemBertModel i.g not just the layer which would force be to use the CLS token for downstream-task. I want the per word context vector, that I would average in a single vector representing whole text.
Tell me if that approach makes sense, of if I should rather use the CLS token.