Hello everyone! hope you all are well. i need some help because to be honest im super beginner and it got a bit confusing.
I’m fine-tuning T5 for the task of generating mongodb queries as an ouput. i finished the training and when i loaded the model to test it, the output was missing Curly braces.
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=64)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=False)[0]
print(decoded_output)
predicted_Query = nltk.sent_tokenize(decoded_output.strip())[0]
print(predicted_Query)
the code above would output this:
<pad> db.movies.find(<unk>"title": "The Poor Little Rich Girl"<unk>, <unk>"writers": 1<unk>)</s>
<pad> db.movies.find(<unk>"title": "The Poor Little Rich Girl"<unk>, <unk>"writers": 1<unk>)</s>
The special token is supposed to be “{” or “}” . is there anyway to fix that ? also does that mean that during training it also outputted <unk>
tokens?
Thanks for you time!