Padding in Decision Transformers Inference

Hi!

I was reading the blog post on Decision Transformers and came across an aspect of the implementation that I’m trying to understand better.

In the provided implementation, there’s a padding operation that pads all sequences shorter than 20 timesteps to a length of 20 even though it’s only processing a batch with a single sample (which is expected in a Reinforcement Learning scenario and is evidenced by the hardcoded “1” in the batch dimension). This got me wondering about its purpose, as transformers are capable of handling sequences of varying lengths (as long as all samples in a batch have the same length)

Could someone clarify the purpose of this padding operation in this context?

Thank you!