Embeddings in yieldBERT

Hello! I have a problem with embeddings in model SmilesClassificationModel. This model is heir of BaseModel from simpletransformers. I want to get embeddings from last hidden state, but I can’t do it, because my output has no param last_hidden_state.

But I got the embeddings this way:

1. Load model

trained_yield_bert = SmilesClassificationModel('bert', model_path,
                                               num_labels=1,
                                               args={"regression": True,
                                                    'config': {"output_hidden_states": True}},
                                               use_cuda=False,
                                              )

tokenizer1 = AutoTokenizer.from_pretrained(model_path)

2. Inputs

test_df.head(1).labels.values is an ordinary SMILES row

bert_inputs = tokenizer1.batch_encode_plus(str(test_df.head(1).labels.values),
                                        max_length=trained_yield_bert.config.max_position_embeddings,
                                           padding=True,
                                           truncation=True,
                                           pad_to_max_length=True,
                                           return_tensors='pt')

bert_inputs
{'input_ids': tensor([[12, 11, 13,  ...,  0,  0,  0],
        [12, 11, 13,  ...,  0,  0,  0],
        [12, 24, 13,  ...,  0,  0,  0],
        ...,
        [12, 43, 13,  ...,  0,  0,  0],
        [12, 98, 13,  ...,  0,  0,  0],
        [12, 11, 13,  ...,  0,  0,  0]]), 'token_type_ids': tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])}

3. Outputs

with torch.no_grad():
    output = trained_yield_bert.model(**bert_inputs)

embeddings = output[0].squeeze().cpu().numpy().tolist()
embeddings
[0.672431230545044,
 0.672431230545044,
 0.8746748566627502,
 0.6140751242637634,
 0.5577840805053711,
 0.522050142288208,
 0.6576945781707764,
 0.6140751242637634,
 0.5635161995887756,
 0.5149366855621338,
 0.5635161995887756,
 0.672431230545044]

4. Questions

Output has a dimensionality of 2. The first dimension is presented above, the second one has a dimensionality 13 x 12 x 512 x 256.
I want to understand, which of these data are embeddings.