How to decode GPT2

mwitiderrick · March 28, 2022, 12:20pm

What’s the proper way to decode the output of GPT2

from transformers import GPT2Tokenizer, TFGPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = TFGPT2Model.from_pretrained('gpt2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
tokenizer.decode(output)

This line tokenizer.decode(output) gives me this error

jdwx · March 28, 2022, 6:55pm

I don’t think the code that you’ve written will give you anything that a tokenizer can decode. By calling TFGPT2Model without a task head (e.g., TFGPTLMHeadModel for causal language modeling), you’re just getting back a TFBaseModelOutputWithPastAndCrossAttentions object with GPT-2’s 768-dimension word embeddings for each input token in output.last_hidden_state.

In your case, output.last_hidden_state is a tensor with shape (1, 10, 768) because you have one input with 10 tokens, and GPT-2 uses 768 embedding dimensions.

The HuggingFace model is to add a “modelling head” on top of the base model to help perform whatever NLP task you’re after. If you’re looking to get tokens you can decode, that’s probably causal language modelling.

A simple TensorFlow example for causal language modelling might look like:

from transformers import GPT2Tokenizer, TFGPT2LMHeadModel


def main():
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='tf')
    model = TFGPT2LMHeadModel.from_pretrained('gpt2')
    output = model.generate(**encoded_input)
    decoded = tokenizer.decode(output[0])
    print(decoded)


if __name__ == '__main__':
    main()

In this example, model.generate() is doing a lot of heavy lifting for you over calling model(encoded_input), most of which is controllable by its enormous number of additional parameters.

jdwx · March 28, 2022, 7:48pm

For the sake of completeness, here’s a minimal example that does call the model directly (i.e. without generate()):

import tensorflow as tf
from transformers import GPT2Tokenizer, TFGPT2LMHeadModel


def main():
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    text = "When it comes to making transformers easy, HuggingFace is the"
    encoded_input = tokenizer(text, return_tensors='tf')
    model = TFGPT2LMHeadModel.from_pretrained('gpt2')
    output = model(encoded_input)
    logits = output.logits[0, -1, :]
    softmax = tf.math.softmax(logits, axis=-1)
    argmax = tf.math.argmax(softmax, axis=-1)
    print(text, "[", tokenizer.decode(argmax), "]")


if __name__ == '__main__':
    main()

It generates exactly one word, which in this case should be “best” since it’s deterministically picking the highest probabibilty word in the output:

2022-03-28 15:45:30.783765: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
When it comes to making transformers easy, HuggingFace is the [  best ]

Frankb · June 17, 2022, 1:58am

it seems that the softmax is redundancy when only the best token is required, because it doesn’t influence the order of logits.

Topic		Replies	Views
Returned Tensors and Hidden State Beginners	4	2646	September 5, 2020
Understanding attention output from generate method in GPT model Beginners	0	608	November 8, 2023
How to use generation of gpt2 from huggingface transformers in tensorflow keras model? Models	0	1109	June 8, 2021
Infinity output from gpt2 model? Beginners	2	153	June 22, 2024
Looking for help with GPT-2 code Models	0	226	February 7, 2024

How to decode GPT2

Related topics