Loading simple csv data for time series transformer

saiverse · October 29, 2023, 12:30am

I am following the time series forecasting blog on HF and I want to try in on my custom dataset that is a simple csv file with one column on timestamp (i.e, ‘start’) and the other column of values I want to predict (i.e, ‘target’).

To use my custom dataset, I followed the example linked at the end of the blog but it doesn’t seem to work. I changed my column names and added an extra column with df['item_id'] = 'A' to match the example dataset. But, it is creating a dataset with only 1 row. I then tried with the original dataset (given in the example) and that only created 10 rows (1 row for each item_id). I cannot use this with the time series notebook that assumes the dataset has multiple rows and already split into train, validation and test sets.

To summarize, my question is - How do I create a HF dataset (from my very standard csv file) that can be used with the time_series notebook (from HF blog linked above)?

I have been struggling with this for more than a day and cannot figure out the link that I am missing.

Here is the code that I am using to create dataset :

class ProcessStartField():
    ts_id = 0

    def __call__(self, data):
        data["start"] = data["start"].to_timestamp()
        self.ts_id += 1

        return data

df = pd.read_parquet('filename.parquet')
df.to_csv('filename.csv')
df = pd.read_csv('filename.csv', index_col=0, parse_dates=True)
df['item_id'] = 'A'

ds = PandasDataset.from_long_dataframe(df, target="inverter_active_power", item_id="item_id")
process_start = ProcessStartField()
list_ds = list(Map(process_start, ds))

features  = Features(
    {
        "start": Value("timestamp[s]"),
        "target": Sequence(Value("float32")),
        "item_id": Value("string"),
    }
)

dataset = Dataset.from_list(list_ds, features=features)
print(dataset)

Thank you!

kashif · October 30, 2023, 4:19pm

so you do not need to use the gluonts PandasDataset if you know what the structure of the time series dataset should be.

Essentially for each time-serie in your data set of time series you make a dict with the appropriate keys, namely the start-date (the first date-time of the target), target which is the array of time time series and the optional item_id which is not really used for training. So you can make this list of of dicts yourself and then you should have all you need for the blog post. let me know if that helps?

Topic		Replies	Views
Problem loading .CSV for Time Series Transformer Beginners	6	791	December 15, 2022
Correct formatting of Multi-Features Time Series dataset 🤗Datasets	4	734	July 19, 2023
GluonTS notebook for correctly formatting Time Series Datasets for the Hub 🤗Datasets	6	1700	August 1, 2023
Efficiently Format Big DataFrame for Ingestion into Time Series Transformer 🤗Transformers	0	280	July 9, 2023
How can I split the Dataset using timestamp feature 🤗Datasets	1	919	June 16, 2023

Loading simple csv data for time series transformer

Related topics