There are a lot of good articles and posts in the forums about custom tokenizers that allow you to train a different vocabulary/language. I cannot find information, though, on customizing the position part as well. Is there documentation on this? If not, can someone point me in the right direction. My goal is to train a transformer on a dataset where the 2-D position of the glyph in the document is just as important as the glyph itself.
Thanks!
Note: A similar question asked before by @bengul but there has been no response since July 2021.
Hi,
I have figured out a way to edit the position embedding. The position embedding in huggingface bert model is defined in the BertEmbedding class. Other models also have an Embedding class. If your task is aligned with what they have, you might be able to get away with changing the Embedding class only. Look Here. Alternatively if you want to provide your preferred position embedding with the input, then you have to change all the classes that use the position embedding. I am not an expert by any means, but feel free to ask anything if you need more help on this.
The position embedding in huggingface bert model is defined in the BertEmbedding class. Other models also have an Embedding class. If your task is aligned with what they have, you might be able to get away with changing the Embedding class only.
Thanks, @bengul, that looks like a very promising route. Will give it a try tomorrow!
Hey!
I’m trying to train a BART model with customized positional embeddings, similarly to what you have been doing and I have a few questions that you perhaps can help me with. First of all, say that I want to change the positional embeddings of BART to sinusoidal embeddings, just like you did @bengul, is that even possible? - My intuition is that we have to re-learn so many parts of the Transformer architecture that it might not be worth doing, or am I wrong here? My second thought is based on the assumption that it actually works, i.e. it is possible to change the positional embeddings, what kind of computing resource is necessary to be able to do this change in how the model treats the positions?
what kind of computing resource is necessary to be able to do this change in how the model treats the positions?
My understanding is that you have to pretrain the model from scratch. That is definitely time and resource consuming (depends on the amount of data and model complexity). I don’t think using the new positional embedding during just fine-tuning would be useful, however I have not tested that.
My idea has been that I want to fine-tune a pre-trained BART model with a length penalty in the positional embeddings of the model. However, my results so far kind of confirms the hypothesis that is is not possible to only fine-tune with new embeddings. I guess I would have to pre-train it from scratch, but I want to avoid that due to the resources needed for such a task.