I am using BartModel. However, My project is using a 3-dimensional attention mask that is not accepted by BartModel (I already tried to input my attention mask with the size of [batch, seq_length, seq_length] and it pulls out errors) and position_ids
that is not an argument when using BartModel. Therefore, I add some codes in _expand_mask()
function, fix something in BartLearnedPositionalEmbedding()
class, and BartModel
class to be suitable for my project. Because BartEncoder
and BartDecoder
class have _expand_mask
function and BartLearnedPositionalEmbedding()
class, I overrided them to replace self.embed_positions
instance with my class that I have already fixed as below:
class My_BartEncoder(BartEncoder):
def __init__(self, config: BartConfig, embed_tokens: Optional[nn.Embedding] = None):
super().__init__(config, embed_tokens)
embed_dim = config.d_model
self.embed_positions = BartLearnedPositionalEmbedding(
config.max_position_embeddings,
embed_dim
)
and also add position_ids
argument to forward
function because I need it for my new BartLearnedPositionalEmbedding
(I did not change the name of the class, just modify the code). And do the same thing with BartDecoder
class. Finally, I change self.encoder
and self.decoder
in BartModel
class with My_BartEncoder
and My_BartDecoder
as I do above.
I want to ask whether I am doing right, I read that self.post_init()
is executed at the end. However, I inherited BartModel
then replace 2 instances like that
class My_BartModel(BartModel):
def __init__(self, config: BartConfig):
super().__init__(config)
self.encoder = My_BartEncoder(config, self.shared)
self.decoder = My_BartDecoder(config, self.shared)
is it legal? And second, I use another names that are not BartEncoder
and BartDecoder
, so when I initialize model by using from_pretrained
, is there any errors or the weights can not recognize the class to put the weights into it? I am a newbie and grateful when you help me.