Using vectors instead of input_ids in BERT

Hi there!

I’ve been struggling with this question for a while now, any help would be appreciated!

So my problem is that instead of passing the ids in a list (input_ids) to our, say, BERT base model, is there a way I can directly give the one-hot vectors in a tensor to the model? and if so, Is there a problem if the vector is not exactly one-hot but more generally a probability distribution over the vocabulary?
Also, I’d rather not use inputs_embeds since that would require that I implement the positional encoding and also map all of them to the model hidden size. I just want to use the basic BERT model with vectors instead of direct ids. Is there a way?

As an illustration, I would want to be able to input [[0, 0, 0.1, 0, 0, …, 0.9, 0, 0]] where the length of the vector is that of the vocabulary. I don’t want to just have one of the non-zero indices, as is the case in one-hot vectors.

What would that “mean”? How do you want the embedded value to be calculated based of this? Do you want to get the weighted average over the vocabulary and use those vectors as input of the encoder?

Yes, instead of one word I want to be able to represent a weighted average of two words in a single “hybrid word” if you will. As I said, I’m looking for a way that does not change any other function that is applied at the lowest level of the model for embedding (the positional encoding, etc.)

This is not possible out-of-the-box but you can simply subclass the model that you are interested in and overwrite the forward method to do the calculations that you need.

Yeah, I guess I’ll have to do that. and thanks a lot for putting in the time, I really appreciate it :slight_smile: