<unk> token in the output instead curly braces

MihoZaki · March 25, 2023, 6:49pm

Hello everyone! hope you all are well. i need some help because to be honest im super beginner and it got a bit confusing.
I’m fine-tuning T5 for the task of generating mongodb queries as an ouput. i finished the training and when i loaded the model to test it, the output was missing Curly braces.


inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=64)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=False)[0]
print(decoded_output)
predicted_Query = nltk.sent_tokenize(decoded_output.strip())[0]

print(predicted_Query)

the code above would output this:

<pad> db.movies.find(<unk>"title": "The Poor Little Rich Girl"<unk>, <unk>"writers": 1<unk>)</s>
<pad> db.movies.find(<unk>"title": "The Poor Little Rich Girl"<unk>, <unk>"writers": 1<unk>)</s>

The special token is supposed to be “{” or “}” . is there anyway to fix that ? also does that mean that during training it also outputted <unk> tokens?

Thanks for you time!

Topic		Replies	Views
Text-to-Sql model keeps missing "<" token Intermediate	3	32	June 11, 2025
T5 generate() output doesn't produce <extra_id_0> 🤗Transformers	1	2256	July 18, 2022
Issue with finetuning a seq-to-seq model 🤗Transformers	30	3952	August 11, 2022
Special token printed out as output 🤗Tokenizers	6	1026	November 24, 2023
Encoder-Decoder model only generates bos_token's [<s><s><s>] Models	17	3159	December 6, 2022

<unk> token in the output instead curly braces

Related topics