Correct interpretation of the model embbedings output

Roman · May 26, 2021, 10:39am

Hello. Using DeepPavlov transformer I was surprised to get different embbedings for the same word ‘шагать’. This is a fictional example showing the essence of the question.

MODEL_NAME = 'DeepPavlov/rubert-base-cased-sentence'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModel.from_pretrained(MODEL_NAME)

model(**tokenizer('шагать шагать', return_tensors='pt', truncation=True, max_length=512)).last_hidden_state.detach().squeeze()

As I can see the tokenizer splits word ‘шагать’ on two tokens: ‘шага’ and ‘##ть’

Output for embbedings is:
tensor([[-0.5780, 0.0937, -0.3210, …, -0.3401, 0.0203, 0.4830],

   [-0.6516,  0.0278, -0.3610,  ..., -0.4095,  0.0527,  0.5094],*

   [-0.6018,  0.1147, -0.2739,  ..., -0.4194,  0.0580,  0.4853],*

   [-0.6632,  0.0110, -0.3995,  ..., -0.3953,  0.0823,  0.4497],*

   [-0.6711,  0.1017, -0.2829,  ..., -0.3797,  0.0994,  0.4285],*

   [-0.6337,  0.0572, -0.3519,  ..., -0.3553,  0.0126,  0.4479]])*

I have expected that vector 1([-0.6516, 0.0278, -0.3610, …, -0.4095, 0.0527, 0.5094] - I guess it corresponds to ‘шага’) will be equal to vector 3 - but I see other values. Same is true and for pair vectors 2,4 (’##ть’).
I guess it is result of my mismatching of how model works. Please explain me wht is wrong in my undestanding…

lewtun · May 26, 2021, 12:52pm

hey @Roman, the reason why you don’t get the same embedding for the same word in a sequence is because the transformers like BERT produce context-sensitive representations, i.e. they depend on the context in which they appear in a sequence.

the advantage of such representations is that you can deal with tricky examples like “Time flies like an arrow; fruit flies like a banana”, where the word “flies” has two very different meanings

you can find a nice description of contextual embeddings here: The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar – Visualizing machine learning one concept at a time.

Topic		Replies	Views
Mistral model generates the same embeddings for different input texts 🤗Transformers	2	339	April 12, 2024
Extracting sentence embeddings from NLP models from each layer seperately Beginners	0	718	August 18, 2021
Unexpected result from transformer model prediction Beginners	0	288	November 21, 2021
Extracting embeddings with distilbert? (in tensorflow) 🤗Transformers	5	2998	August 6, 2021
Extracting token embeddings from pretrained language models Beginners	9	22052	May 2, 2024

Correct interpretation of the model embbedings output

Related topics