Trying to process longer documents with BERT-based models

I’m trying to get Bert models to work for longer documents, and I found this relatively simple-hack

            my_pos_embeddings = nn.Embedding(args.max_pos, self.bert.model.config.hidden_size)
  [:512] =
  [512:] =[-1][None,:].repeat(args.max_pos-512,1)
            self.bert.model.embeddings.position_embeddings = my_pos_embeddings

When I tried to make this work for Transformers 1.2.0 it worked out of the box, but I wanted to try this with the newer models and updated my package to 4.3.2

I’ve localised the difference, which is as follows:

In 1.2.0, the way embeddings were calculated is as follows:

words_embeddings = self.word_embeddings(input_ids)
position_embeddings = self.position_embeddings(position_ids)
token_type_embeddings = self.token_type_embeddings(token_type_ids)

embeddings = words_embeddings + position_embeddings + token_type_embeddings

While in 4.3.2, it changes to

    if position_ids is None:
        position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]

    if token_type_ids is None:
        token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=self.position_ids.device)

    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

    embeddings = inputs_embeds + token_type_embeddings
    if self.position_embedding_type == "absolute":
        position_embeddings = self.position_embeddings(position_ids)
        embeddings += position_embeddings
    embeddings = self.LayerNorm(embeddings)
    embeddings = self.dropout(embeddings)
    return embeddings

Is there anyway to change my code to make it work for transformers 4.3.2?