I am following the time series forecasting blog on HF and I want to try in on my custom dataset that is a simple csv file with one column on timestamp (i.e, ‘start’) and the other column of values I want to predict (i.e, ‘target’).
To use my custom dataset, I followed the example linked at the end of the blog but it doesn’t seem to work. I changed my column names and added an extra column with df['item_id'] = 'A'
to match the example dataset. But, it is creating a dataset
with only 1 row. I then tried with the original dataset (given in the example) and that only created 10 rows (1 row for each item_id
). I cannot use this with the time series notebook that assumes the dataset has multiple rows and already split into train, validation and test sets.
To summarize, my question is - How do I create a HF dataset
(from my very standard csv file) that can be used with the time_series notebook (from HF blog linked above)?
I have been struggling with this for more than a day and cannot figure out the link that I am missing.
Here is the code that I am using to create dataset
:
class ProcessStartField():
ts_id = 0
def __call__(self, data):
data["start"] = data["start"].to_timestamp()
self.ts_id += 1
return data
df = pd.read_parquet('filename.parquet')
df.to_csv('filename.csv')
df = pd.read_csv('filename.csv', index_col=0, parse_dates=True)
df['item_id'] = 'A'
ds = PandasDataset.from_long_dataframe(df, target="inverter_active_power", item_id="item_id")
process_start = ProcessStartField()
list_ds = list(Map(process_start, ds))
features = Features(
{
"start": Value("timestamp[s]"),
"target": Sequence(Value("float32")),
"item_id": Value("string"),
}
)
dataset = Dataset.from_list(list_ds, features=features)
print(dataset)
Thank you!