I would like to obtain sentence embeddings for the texts in my dataset. To achieve this, I am utilizing the Sentence Transformer model called âbert-base-nli-mean-tokens.â During the training of this pre-trained model, the developers used mean pooling as the pooling method. However, I would like to use the model with max-pooling instead, without undergoing any additional training. I have written the following code, which runs without errors, but I am unsure if it is a valid approach. Could you assist me with this?
model_name = âsentence-transformers/bert-base-nli-mean-tokensâ
word_embedding_model = models.Transformer(model_name)
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),
pooling_mode_mean_tokens=False,
pooling_mode_cls_token=True,
pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
sentence_embeddings=model.encode(my_dataset)