Loading simple csv data for time series transformer

I am following the time series forecasting blog on HF and I want to try in on my custom dataset that is a simple csv file with one column on timestamp (i.e, ‘start’) and the other column of values I want to predict (i.e, ‘target’).

To use my custom dataset, I followed the example linked at the end of the blog but it doesn’t seem to work. I changed my column names and added an extra column with df['item_id'] = 'A' to match the example dataset. But, it is creating a dataset with only 1 row. I then tried with the original dataset (given in the example) and that only created 10 rows (1 row for each item_id). I cannot use this with the time series notebook that assumes the dataset has multiple rows and already split into train, validation and test sets.

To summarize, my question is - How do I create a HF dataset (from my very standard csv file) that can be used with the time_series notebook (from HF blog linked above)?

I have been struggling with this for more than a day and cannot figure out the link that I am missing.

Here is the code that I am using to create dataset :

class ProcessStartField():
    ts_id = 0

    def __call__(self, data):
        data["start"] = data["start"].to_timestamp()
        self.ts_id += 1

        return data

df = pd.read_parquet('filename.parquet')
df.to_csv('filename.csv')
df = pd.read_csv('filename.csv', index_col=0, parse_dates=True)
df['item_id'] = 'A'

ds = PandasDataset.from_long_dataframe(df, target="inverter_active_power", item_id="item_id")
process_start = ProcessStartField()
list_ds = list(Map(process_start, ds))

features  = Features(
    {
        "start": Value("timestamp[s]"),
        "target": Sequence(Value("float32")),
        "item_id": Value("string"),
    }
)

dataset = Dataset.from_list(list_ds, features=features)
print(dataset)

Thank you!

so you do not need to use the gluonts PandasDataset if you know what the structure of the time series dataset should be.

Essentially for each time-serie in your data set of time series you make a dict with the appropriate keys, namely the start-date (the first date-time of the target), target which is the array of time time series and the optional item_id which is not really used for training. So you can make this list of of dicts yourself and then you should have all you need for the blog post. let me know if that helps?