Padded sequences in language model (like BERT) with LSTM on top

elya5 · September 9, 2022, 10:20am

I have seen several papers using a language model like BERT in combination with an LSTM. I am wondering if/how padded sequences are being considered with that approach.

As a small example (without full torch module and training loop):

model = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
lstm = torch.nn.LSTM(768, 256)

inputs = tokenizer(['this is a text'], max_length=10, padding='max_length', return_tensors='pt')
outputs = mode(**inputs)
lstm(outputs.last_hidden_state)

Will the embeddings of padding tokens then “distort” the output of the LSTM layer? If so, how can I avoid that?

Topic		Replies	Views
BERT Model predicting 'PAD' for NER Beginners	0	597	November 11, 2021
The (hidden) meaning behind the embedding of the padding token? Awesome paper	2	6280	July 14, 2021
BERT embeddings on big dataset 🤗Datasets	3	123	August 28, 2024
Bert strugling with Padded sentence 🤗Transformers	0	386	August 24, 2021
Question about Bert padding part when calcualting similarity matrix Beginners	2	688	May 13, 2022

Padded sequences in language model (like BERT) with LSTM on top

Related topics