Combining vectors when using contextual word embeddings with large datasets

CharlieB · July 23, 2024, 7:41pm

I’m interested in using contextual word embeddings generated by a transformer-based model to explore the similarity of certain words in a large dataset.

As my dataset is much larger than the max tokens allowed in most transformer models, presumably I would need to break the dataset down into individual sentences & feed them into the model. That would give me a list of word embeddings per sentence. What I’m struggling to understand is how I could then best translate this list into meaningful embeddings per word across the whole dataset.

The immediately obvious approach would be to find the average embedding for each word. However, the point of contextual embedding is that it can identify different uses/meanings for the same word. ‘Bank’ as a noun and ‘bank’ as an adjective may have very different embeddings and therefore I’m not sure that the average would have a great deal of meaning. Is this a genuine concern? Is there a better approach?

In such a use case is there any value in using a contextual transformer model over a static one?

Topic		Replies	Views
Conceptual questions about transformers 🤗Transformers	10	1083	August 26, 2021
Generate raw word embeddings using transformer models like BERT for downstream process Beginners	9	39931	October 4, 2021
Training BERT for word embedding Beginners	17	14469	November 12, 2022
Word, sentence or long context embedding? Beginners	0	367	March 8, 2024
Using LLMs word embeddings within context Models	2	1178	January 25, 2024

Combining vectors when using contextual word embeddings with large datasets

Related topics