Is BERT document embedding model?

Such models output representations for each token in context of other tokens to the left and to the right of it. You need to aggregate these representations somehow to obtain a single vector representing a document. A common approach is to average vectors of each token, for example. I’d suggest using sentence transformers for this purpose.

1 Like