Correct interpretation of the model embbedings output

Hello. Using DeepPavlov transformer I was surprised to get different embbedings for the same word ‘шагать’. This is a fictional example showing the essence of the question.

MODEL_NAME = 'DeepPavlov/rubert-base-cased-sentence'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModel.from_pretrained(MODEL_NAME)

model(**tokenizer('шагать шагать', return_tensors='pt', truncation=True, max_length=512)).last_hidden_state.detach().squeeze()

As I can see the tokenizer splits word ‘шагать’ on two tokens: ‘шага’ and ‘##ть

Output for embbedings is:
tensor([[-0.5780, 0.0937, -0.3210, …, -0.3401, 0.0203, 0.4830],

  •    [-0.6516,  0.0278, -0.3610,  ..., -0.4095,  0.0527,  0.5094],*
    
  •    [-0.6018,  0.1147, -0.2739,  ..., -0.4194,  0.0580,  0.4853],*
    
  •    [-0.6632,  0.0110, -0.3995,  ..., -0.3953,  0.0823,  0.4497],*
    
  •    [-0.6711,  0.1017, -0.2829,  ..., -0.3797,  0.0994,  0.4285],*
    
  •    [-0.6337,  0.0572, -0.3519,  ..., -0.3553,  0.0126,  0.4479]])*
    

I have expected that vector 1([-0.6516, 0.0278, -0.3610, …, -0.4095, 0.0527, 0.5094] - I guess it corresponds to ‘шага’) will be equal to vector 3 - but I see other values. Same is true and for pair vectors 2,4 (’##ть’).
I guess it is result of my mismatching of how model works. Please explain me wht is wrong in my undestanding…

hey @Roman, the reason why you don’t get the same embedding for the same word in a sequence is because the transformers like BERT produce context-sensitive representations, i.e. they depend on the context in which they appear in a sequence.

the advantage of such representations is that you can deal with tricky examples like “Time flies like an arrow; fruit flies like a banana”, where the word “flies” has two very different meanings :slight_smile:

you can find a nice description of contextual embeddings here: The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar – Visualizing machine learning one concept at a time.