I’ve been struggling with this question for a while now, any help would be appreciated!
So my problem is that instead of passing the ids in a list (input_ids) to our, say, BERT base model, is there a way I can directly give the one-hot vectors in a tensor to the model? and if so, Is there a problem if the vector is not exactly one-hot but more generally a probability distribution over the vocabulary?
Also, I’d rather not use inputs_embeds since that would require that I implement the positional encoding and also map all of them to the model hidden size. I just want to use the basic BERT model with vectors instead of direct ids. Is there a way?
As an illustration, I would want to be able to input [[0, 0, 0.1, 0, 0, …, 0.9, 0, 0]] where the length of the vector is that of the vocabulary. I don’t want to just have one of the non-zero indices, as is the case in one-hot vectors.