How to use generation of gpt2 from huggingface transformers in tensorflow keras model?

yananchen · June 8, 2021, 8:06am

I want to use the GPT2 from huggingface transformers in tensorflow keras model definition.

input_ids = tf.keras.layers.Input(
    shape=(max_len,), dtype=tf.int32, name="input_ids"
)
attention_masks = tf.keras.layers.Input(
    shape=(max_len,), dtype=tf.int32, name="attention_masks"
)
gpt2 = TFGPT2LMHeadModel.from_pretrained('gpt2')
gpt2.trainable = True

#outputs = model(inputs)
output_sequences = gpt2.generate(
    input_ids = input_ids,#inputs['input_ids'],
    attention_mask = attention_masks, #inputs['attention_mask'],
    max_length= max_len*2,
    temperature=1,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1,
    do_sample=True,
    num_return_sequences=num_return_sequences
)
model = tf.keras.Model(inputs=[input_ids, attention_masks], outputs=output_sequences)

however, gpt2.generate can not take input_ids and attention_masks as inputs.

The error:

TypeError: Keras symbolic inputs/outputs do not implement __len__ . You may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model. This error will also get raised if you try asserting a symbolic input/output directly.

How can I use generate process of gpt2 in the model ?

The final goal if to calculate the loss outside, based on output_sequences and update the parameters of the model which contains GPT2.

Topic		Replies	Views
How to decode GPT2 🤗Transformers	3	7770	June 17, 2022
Understanding attention output from generate method in GPT model Beginners	0	616	November 8, 2023
How to generate a sequence using inputs_embeds instead of input_ids? 🤗Transformers	4	8438	April 17, 2022
GPT-2 Logits to tokens for beam search (Generate method) 🤗Transformers	0	1316	September 2, 2021
Returned Tensors and Hidden State Beginners	4	2653	September 5, 2020

How to use generation of gpt2 from huggingface transformers in tensorflow keras model?

Related topics