Is it reasonableto pretrain by masking certain dimensions of each vector, rather than the individual token?

Weilin · July 15, 2020, 3:11am

Let’s say I want to adapt Transformers to a non-NLP task, like financial data or a multiplayer online video game. You can imagine that the high-dimensional vector of each input will contain information that pertain to different events. For example, the first 10 dimensions might describe player 1, and the next 10 dimensions might describe player 2.

If I were to extend the pre-training exercise to these non-NLP tasks, I think it could be reasonable to mask the actions of certain players in order to predict back their actions. This would essentially involve masking certain dimensions of a vector rather than masking the entire “input”.

My question is: is this reasonable to do and is this even the right approach?

joeddav · July 15, 2020, 2:55pm

I don’t know what kind of input embeddings you’d be working with in that case, but the problem you’ll probably run into is that latent embeddings are usually not as nicely disentangled as you’ve described here. We sometimes talk about them as if they were for illustrative purposes, but in reality your description of “player 1” is probably distributed across the entire vector rather than existing entirely in some subset of vector positions.

Weilin · July 16, 2020, 12:25am

Hi Joeddav,

Naive question here, but rather than learned embeddings like in the case of words, if I directly create the input vector such that player1’s actions can be described via dimensions 1-5, player2’s actions are described via dimensions 6-10 and etc, then does that mean that each player’s information is disentangled by design?

If so, would my question become a reasonable one or is there another way to encode multi-player information in a Transformers model?

Thanks so much

joeddav · July 21, 2020, 7:52pm

Sure, that sounds like a reasonable thing to try. Let us know how it goes – I’m sure we’d all learn something

Topic		Replies	Views
Pre - Train model with inputs_embeds 🤗Transformers	0	373	July 4, 2023
How does BERT only compute the softmax for the masked hidden vectors? Models	0	481	January 6, 2023
'Simple' regression transformer 🤗Transformers	3	2127	May 11, 2022
Numeric embedding input for transformers Beginners	0	977	June 13, 2023
Pretrain own model 🤗Transformers	0	270	October 23, 2023

Is it reasonableto pretrain by masking certain dimensions of each vector, rather than the individual token?

Related topics