Padded sequences in language model (like BERT) with LSTM on top

I have seen several papers using a language model like BERT in combination with an LSTM. I am wondering if/how padded sequences are being considered with that approach.

As a small example (without full torch module and training loop):

model = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
lstm = torch.nn.LSTM(768, 256)

inputs = tokenizer(['this is a text'], max_length=10, padding='max_length', return_tensors='pt')
outputs = mode(**inputs)
lstm(outputs.last_hidden_state)

Will the embeddings of padding tokens then “distort” the output of the LSTM layer? If so, how can I avoid that?