Hello,
I want to train an informer from scratch but I’m running into the complexity of language barrier.
I understand the worlds I read individually but I can’t make sens of the sentences.
Or I might be a moron… don’t know.
it’s been several weeks that I started this project to see if I could create an indicator for FOREX pairs to help in the decision making but I hit wall after wall.
I finally found this forum and hope to lift the fog before my eyes about AI training.
Right now I have the dataset ready and need to create the model.
As I understand the huggingface docs I need to create a config for my informer model :
dataset = Dataset.from_pandas ( data[["open", "high", "low", "close", "open_change", "high_change", "low_change", "close_change", "day_of_year", "day_of_month", "day_of_week", "hour", "minute"]] )
train_test = dataset.train_test_split(test_size=0.2, shuffle=False)
test_val = train_test["test"].train_test_split(test_size=0.5, shuffle=False)
dataset = DatasetDict({
"train": train_test["train"],
"test": test_val["train"],
"val": test_val["test"],
})
#series configuration
history_length = 50
prediction_length = 15
#the lag sequence (I don't really understand what it does)
#my assumption is that it takes history_length datapoints from the set lag
#not sure if it is needed for my application
lags_sequence = get_lags_for_frequency("15min")
config = InformerConfig(
# number of forcasting
prediction_length=prediction_length,
#data to predict from
context_length=history_length,
#as I understand it, it should be my "day_of_year", "day_of_month", "day_of_week", "hour" and "minute" I might add a position metric relative to the "age" of the datapoint
num_time_features=5,
#should be the rest of my fields "open", "high", "low", "close", "open_change", "high_change", "low_change", "close_change"
num_dynamic_real_features=8,
# complexity of the model
d_model=128,
# not sure if these parameters have rules to be followed relative to the input and output size or the size of prediction length
encoder_layers=4,
decoder_layers=4,
encoder_attention_heads=8,
decoder_attention_heads=8,
#since I don't have static or categories I don't think it is worth the trouble to cache or maybe it the inferences are done in real life the 99%of the time series will be the same for each inference
use_cache=False,
sampling_factor=2
)
#this seams straight forward
model = InformerForPrediction(config)
criterion = torch.nn.MSELoss() #I might change it depending on the performances
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
#using cuda of course
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
load_train = DataLoader(
dataset=dataset["train"],
batch_size=history_length,
shuffle=False, #keep the timeseries ordered obviously for timedependant data
num_workers=0, # don't know what it's for ... parallelism maybe
)
#and here I am struggling with yet an other seemingly simple thing
#I can't figure out how to proceed from here... assuming the rest of the code is ok
for epoch in range(10):
model.train()
total_loss = 0
for batch in load_train:
optimizer.zero_grad()
print (batch)
In the comment I made some points of what I think some things work and needing much clarification.
Now for the main question. How can I train my model…
I tried to put the batch in video memory but I’m kindly reminded that dict doesn’t have a “to” member
I might need to decompose the batch and send every timeseries in memory individually but I feel like missing something huge.
Can anyone help me ?