Problem loading .CSV for Time Series Transformer

Im not an experienced programmer, but this should be simple and I’ve been trying to hack out a fix for 20 of the last 36 hours, so help would be very appreciated.
I’m trying to run my own dataset with code from the time-series-transformers.ipynb and cant get it to integrate well.

My datasetDict looks very similar to the one in the demo (load_dataset(“monash_tsf”, “tourism_monthly”) with two major exceptions.

  1. My “start” key section looks like an array, but it has parenthesis around the brackets like ‘[4,5,6,6,6]’
  2. my “feat_static_cat” key section is an int64 but doesnt have brackets around it like the same key in the demo - I’ve only figured out how to add them using lamba split but it requires converting to a string.

The error in the notebook is arising just before the “Forward Pass” section when using my datasetDict

This feels like such a simple problem, I could be wrong in what I think is the problem but it seems to just be a formatting error somewhere early in the data load.

I assigned features and dtype early on as youll see in my notebook, but later in the script the “start” feature are still described as “str”

link to collab notebook im struggling with: Google Colab

link to .csv file I am trying to use: testingdataload.csv - Google Drive

Like I said, any help would be very much appreciated.

this is the notebook Im trying to iterate on

I realize I was ignorant of 2 things since posting this question,

  1. There is a datasets forum that this question should probably have been posted to.

  2. In the time-series-transformer blog post and demo notebook, the instructions point to GluonTS and converting data from a pandas dataframe into a proper dataset.

I still haven’t figured it out yet but I am posting a more clear question in the Datasets section, with the promise that if someone gives me a solution, I will share 2-3 years of M5 price data for every equity in the SP500 to the hub, public for anyone to use. I have this data already. It usually costs money for people to get, but I harvested it from different APIs and its in CSV format.

@Harrisonhi thanks for posting the csv and colab. going to have a look at it now.

The main question is which column in the csv is the target you wish to predict?

I would like to predict next day close price, df[‘close’]

I know that the data should be normalized before use in a model, but just trying to get a gluonts compatible dataset first.
My problem remains that of the various ways I can make a gluonts dataset, none of them create a dataset that makes sense to me for use in a model.
I could be wrong, but I think what makes most sense is a structure like:

Train:{
“start”:[timestamp, freq=freq],
“target”:[*close price on day of “start”],
“feat_static_cat”:[*static features of current time series, exchange, ticker, asset_type],
“feat_dynamic_cat”:[*dynamic features of current time sereies-
that change through time but should be used-
ex: sine, indicator, HLCV %chg, ect.],
“item_id”:[unique identifier of some sort]
}

Where there is a train_ds for every value of time in the time series, ie there should be as many x in train_ds as there are rows in the original dataframe.

@kashif

This is actually the format of the data I am starting with.

Apologies for any incongruity in my asking for help. It’s been something I’ve been trying to work on for a couple weeks now and have been learning along the way, causing some variance in how I ask questions.

no problem! i will have a look at it now

1 Like