Padding in Decision Transformers Inference

joaomr · May 22, 2023, 5:50pm

Hi!

I was reading the blog post on Decision Transformers and came across an aspect of the implementation that I’m trying to understand better.

In the provided implementation, there’s a padding operation that pads all sequences shorter than 20 timesteps to a length of 20 even though it’s only processing a batch with a single sample (which is expected in a Reinforcement Learning scenario and is evidenced by the hardcoded “1” in the batch dimension). This got me wondering about its purpose, as transformers are capable of handling sequences of varying lengths (as long as all samples in a batch have the same length)

Could someone clarify the purpose of this padding operation in this context?

Thank you!

Topic		Replies	Views
Importance of padding for tokens and same size inputs for transformers 🤗Transformers	1	681	October 22, 2021
Sequences shorter than model's input window size 🤗Transformers	2	1173	January 4, 2022
The (hidden) meaning behind the embedding of the padding token? Awesome paper	2	6297	July 14, 2021
Why does padding = 'max_length' cause much slower model inference? Models	1	621	June 8, 2023
Training with varying lengths of sequences Beginners	0	1625	May 31, 2023

Padding in Decision Transformers Inference

Related topics