Hello, I’m relatively new to working with transformers, and I’ve been exploring the implementation of a Time Series Transformer. However, I’m struggling to grasp how time features align with lagged values in the training phase.
Upon examining the code for create_network_inputs()
in TimeSeriesTransformerModel()
alongside the generate()
function in ‘TimeSeriesTransformerForPrediction()’, an observation is that “1” is a mandatory value for config.lags_sequence
; otherwise inference would not work because there would be missing values between the observed context and the forecast. Also, it is worth mentioning that the minimum value for config.lags_sequence
should not fall below one (zero or negative). In the first piece of code, the function calls ‘get_lagged_subsequences()’ with shift=0
, while the second utilizes shift=1
.
It is apparent that, during the inference process, the last time value must be utilized as the initial token to commence the incremental, step-by-step greedy forecast—a method that aligns with logical reasoning. Doing so would require future values that are not currently available, introducing a practical constraint.
def get_lagged_subsequences(
self, sequence: torch.Tensor, subsequences_length: int, shift: int = 0
) -> torch.Tensor:
...
sequence_length = sequence.shape[1]
indices = [lag - shift for lag in self.config.lags_sequence]
...
lagged_values = []
for lag_index in indices:
begin_index = -lag_index - subsequences_length
end_index = -lag_index if lag_index > 0 else None
lagged_values.append(sequence[:, begin_index:end_index, ...])
return torch.stack(lagged_values, dim=-1)
During the training phase in create_network_inputs()
, it appears that for lag_index= 1 the last values lagged by one step lagged_values.append(sequence[:, - self.config.context_length - max(self.config.lags_sequence) -1 :-1, ...])
are aligned with the time features of the last time observed context past_time_features[:, - self.config.context_length - max(self.config.lags_sequence) :, ...].
This seems to be misaligned.
Wouldn’t it be more appropriate for Values to be aligned with Time features in synch? In cases where steps are not equally distributed the current alignment could lead to the omission of relevant information from the last observed value features. Adjusting this alignment could enhance the model’s ability to capture relevant temporal patterns.
def create_network_inputs(...):
# time feature
time_feat = (
torch.cat(
(
past_time_features[:, self._past_length - self.config.context_length :, ...],
future_time_features,
),
dim=1,
)
if future_values is not None
else past_time_features[:, self._past_length - self.config.context_length :, ...]
)
....
# lagged features
subsequences_length = (
self.config.context_length + self.config.prediction_length
if future_values is not None
else self.config.context_length
)
lagged_sequence = self.get_lagged_subsequences(sequence=inputs, subsequences_length=subsequences_length) # shift = 0, default
....
return transformer_inputs, loc, scale, static_feat
This question would apply to Time Series transformer, Informer and Autoformer because they share the same code.
Thank you for anyone helping here.
Thank you
David