Such models output representations for each token in context of other tokens to the left and to the right of it. You need to aggregate these representations somehow to obtain a single vector representing a document. A common approach is to average vectors of each token, for example. I’d suggest using sentence transformers for this purpose.
1 Like