Time series Prediction: inference process

Hello everyone,
I’m pretty new in Machine learning world but i try to use the time series transformer by following the blog presented here: Probabilistic Time Series Forecasting with :hugs: Transformers (huggingface.co) on data from yahoofinance.
Doing so, as my results looks a bit suspicious, i’m analysing more in depth the code provided in the blog and i have some general questions:

  • what is the method generate providing as results? why do we take the median of the prediction? does all the provided prediction do have the same probability?
  • Is the inference step done on the test_dataset done only once for the prediction_length? starting at the end of the train_dateset? I’m a bit lost on this part. In my problem i would like to be able to perform it on a rolling window basis ( re-using the last observations in the test dataset).
  • i can see also that the past_length = config.context_length + max(config.lags_sequence) but if i read the comment i would have written past_length = max(config.context_length, max(config.lags_sequence))
    -in the blog, mutiple timeseries are provided to the model but how many model are then created? one fore each time series? or one unique one valid for all the time series provided? I ask this because i provided 13 stocks and i have the feeling the predictions done are always kind of global trend.

If one of you can help me with any of these questions, it would really help me.
I have others questions or grey points on my understanding of it but for now Chat GPT and HuggingChat provide me impressive help.

Thanks for the questions. Do note that predicting financial data by itself is tricky not because of the models involved but rather the time-series by themself are more or less random (as the underlying dynamics of the data are driven by factors that the model has no access to…) as such the raw time series has very little predictive signal. To answer your questions more specifically:

Recall we are learning the parameters of some chosen probability distribution at each time point and thus at inference time we sample from the resulting distribution and pass those say 100 samples back into the transformer to obtain the distribution of the next time step… This is analogous to running say 100 simulations of the future. Thus we now have 100 samples per time step and from these 100 samples we can calculate the empirical mean/median and uncertainties for plotting and comparing against the single ground truth (in the back-testing scenario)

The inference step loops over prediction_length times. So to perform I assume back-testing via a rolling window you will need to create your back-testing dataset with larger and larger target arrays.

Regarding the past_length, recall that for univariate time series, we only have a 1-d array of numbers, while transformers take as input a sequence of vectors. Thus we need to create a context_length sequence of vectors from this 1-d array (and also keep the temporal causal structure of the 1d array intact). We do this via the lag operation given by the indices in the lag_sequences array that copies values from the past into each position. Thus to end up with a context_length sequence of vectors we need a larger 1-d array and in particular it has to be large enough so that the very first vector can get the lag values it needs. If you draw this you will see that one needs a config.context_length + max(config.lags_sequence) initial 1-d array

Neural forecasters typically train a single model over the dataset… just like a single image classification model is trained over a collection of images, rather than a model per image. As mentioned above these models cannot magically predict the future but rather learn the distribution of the future given the distribution of the data in the past. If as mentioned the data is more or less random (as is the case when dealing with raw financial time series) then there is not much that the model can learn. Recall the movement of financial series is driven by a lot of external factors that the model has no access to.

hopefully, that clears up some confusion. Let me know

1 Like