I’m studying the decision transformer referring to Train your first Decision Transformer.
In the post, the example is for “halfcheetah” ( action space is continuous) and
the following model code is used.
I’m trying to apply this to the discrete action space.
I added the logit layer for the discrete action and changed the loss function as below.
( red color: removed , blue color: added )
Is this the right approach?
Hey,
I’m also trying out something similar recently. Based on my implementation, your implementation looks almost similar to mine, except for the fact that I did not use an additional linear layer and directly used the outputs as the logit. Also, I had encode my actions into one-hot-encoding, so I had to do some reshaping with the action targets, but otherwise, I think this is pretty much spot on.
I usually just print out the shapes and the intermediate variables at least once for a sanity check to make sure everything looks right and nothing it broadcasted incorrectly.
Hey!
and in the original_forward, why did you define the action_targets? Because I understand that the logits are the action_preds, no?
Yeah, it does seem like the action_targets are technically not needed in the original_forward() function which is used during test mode, and I don’t see it being used in the code snippet either.
Thank you for your quick response! I’m using DT for my bachelor degree final thesis and I’m a bit lost. Seeing that you have been using it for a longer time, I hope you don’t mind if I ask you some questions.
I have a case very similar to the one you mentioned, with discrete actions and using one-hot encoding. So, does this code look good in your opinion?
I’m also at a bit of a loss as to which activation function best fits this type of problem and certain model configuration parameters. How do you see these?
Thanks in advanced!!!
Thanks for sharing. It helps me a lot.