Run tensorflow transformer T5 model with huggingface generate() function return bad reply

im using a notebook from this github repository: GitHub - flogothetis/Abstractive-Summarization-T5-Keras: This abstractive text summarization is one of the most challenging tasks in natural language processing, involving understanding of long passages, information compression, and language generation. The dominant paradigm for training machine learning models to do this is sequence-to-sequence (seq2seq) learning, where a neural network learns to map input sequences to output sequences. While these seq2seq models were initially developed using recurrent neural networks, Transformer encoder-decoder models have recently become favored as they are more effective at modeling the dependencies present in the long sequences encountered in summarization.

notebook link: Abstractive-Summarization-T5-Keras/AbstractiveSummarizationT5.ipynb at main · flogothetis/Abstractive-Summarization-T5-Keras · GitHub

I just have a problem, i wanna run the saved model and run that with generate function of hugging face, the generate functions gives this opportunity to use num_return_sequences and max_len of generated output

it’s the code i use to run saved model with generate() hugging face but it returns trash reply and not a true response! I.e for summerizing this below text:

getSummary("With your permission we and our partners may use precise geolocation
data and identification through device scanning. You may click to consent to our
and our partners’ processing as described above. Alternatively you may access more
detailed information and change your preferences before consenting or to refuse consenting.")
the getSummary returns something like this:
We may use geolocation data through device scanning

but when i use the generate function of hugging face, the code is as below:

text=‘’‘With your permission we and our partners may use precise geolocation data and identification through device scanning. You may click to consent to our and our partners’ processing as described above. Alternatively you may access more detailed information and change your preferences before consenting or to refuse consenting.’‘’

`tokenizer = T5Tokenizer.from_pretrained('t5-small')
model0 = TFT5ForConditionalGeneration.from_pretrained(to_directory)
inputs = tokenizer([text], return_tensors="tf")
generated = model0.generate(**inputs,decoder_start_token_id=tokenizer.pad_token_id,do_sample=True)

print("Sampling output: ", tokenizer.decode(generated[0]))`

the output is trash like this: Sampling output: kurz Upholster Month citoyenjohnpointviousgren suppression awful Tommy Partners animaux Certain temptationanischadenCenterani FUN awful partager Lexington Ãœb

it generate output but output is trash and no mean! if i tray model.generate() without this decoder_start_token_id=tokenizer.pad_token_id

generated = model.generate(**inputs,do_sample=True) it reurns an error as: decoder_start_token_id or bos_token_id has to be defined for encoder-decoder generation

is there any solotion to lead model.generate() return true reply?

by the way start and end tokens are as below;

print(end_token,tokenizer.eos_token_id)# 1 print(start_token,tokenizer.pad_token_id)# 1 any solutions? much obliged


i tried multiple ways but it did not work at all?
so smart one has any idea?

Hi @saratayylor :wave:

In general, a model returning garbage output is a sign of a distribution shift, i.e., you are trying to make the model do something it was not trained for. It can be due to other reasons, but this would be the most common cause by far.

In the script you shared, the model is loaded from a directory (model0 = TFT5ForConditionalGeneration.from_pretrained(to_directory)), so I can’t confirm that distribution shift is the problem.

Nevertheless, I’d recommend starting with baby steps! :hugs: The pretrained T5 models are already prepared to do summarization (see here for an example). Try using a pretrained model with your inputs and, if needed, fine-tune it. If the pre-trained model is working fine and the fine-tuned is not, then you can be sure that the problem resides in the fine-tuning procedure.

hi thanks for your reply but it solved nothing!
could you offer a solution that work?
thanks do much