Understanding the Decision Transformer

dydavide · May 25, 2024, 6:31pm

I’m trying to implement my own Decision Transformer for some Reinforcement Learning application.
Even if the tutorial on huggingface helped me a lot in creating a first running example with my own data, I stuck on the values returned by the class Datacollator (DecisionTransformerGymDataCollator) in the huggingface tutorial linked above.
Basically the collator return the following values:

        return {
            "states": s,
            "actions": a,
            "rewards": r,
            "returns_to_go": rtg,
            "timesteps": timesteps,
            "attention_mask": mask,
        }

But which “token” does the DT model use for predicting the next action?
In case of NLP you have words, which are tokenized to integer values. Let’s put is simple and let’s say: 1 work → 1 token. So you have one unique integer value for one work.
But what’s the equivalent in the case of the Decision Transformer? Considering the state, the rewards and the return to go, you have at least 3 different values that cannot be tokenized in one integer value as in the case of NLP.

Any help?
Thanks

Topic	Replies	Views
Huggingface DecisionTransformer - Reward Calculation Beginners	234	September 15, 2022
How come the Decision Transformer requires the rewards as an input? Beginners	331	February 24, 2023
Decision Transformer a question about the tutorial 🤗Transformers	127	April 15, 2024
Unexpected result from transformer model prediction Beginners	288	November 21, 2021
Question about the output of the decision transformer 🤗Transformers	153	December 11, 2023

Understanding the Decision Transformer

Related topics