Sequence to Sequence Modelling Doubt

Sequence to Sequence Modelling - Repitition and uncontrolled generation

Hi guys,

I am interested in building a description/caption generator using a template based approach.

I have a rule based method which converts an input sentence into a template as shown below.

Input Template
Shaped to a shirt silhouette, it’s easy to wear Shaped to a SILHOUETTE, it’s easy to wear
The paisley print will look and make you feel amazing The PRINT will look and make you feel amazing

Now, I can fill in these templates to obtain sentences by filling in the placeholders like SILHOUETTE with tags pertaining to SILHOUETTE like shift, aline, bodycon etc.

There’s 215 templates and after filling in the placeholders, I have 215 * 3 = 645 total sentences. The goal is to now formulate this as a sequence to sequence problem with the following i/p and o/p

Input Sequence: <start> <tag> <tag> <tag> <end>

Output Sequence: <start> sentence with tags <end>

A few examples of the same are as follows:

Input Filled Template
<start> shirt <end> <start> Shaped to a shirt silhouette, it’s easy to wear <end>
<start> paisley <end> <start> The paisley print will look and make you feel amazing<end>
<start> thin puff <end> <start> This dress features thin straps and puff sleeves <end>

I trained a T5-base model with simple_transformers with the following model hyper-parameters

{
    "reprocess_input_data": True,
    "overwrite_output_dir": True,
    "max_seq_length": 128,
    "train_batch_size": 8,
    "num_train_epochs": 5,
    "save_eval_checkpoints": True,
    "save_steps": -1,
    "use_multiprocessing": False,
    "evaluate_during_training": True,
    "evaluate_during_training_steps": 15000,
    "evaluate_during_training_verbose": True,
    "fp16": True,
    "num_beams": None,
    "do_sample": True,
    "max_length": 50,
    "top_k": 50,
    "top_p": 0.95,
    "num_return_sequences": 3,
}

However in the results, I observed that some results are alright as follows

https://imgur.com/nWlyieO

but in many results, there’s a lot of repetition

https://imgur.com/cZaSVB0

and in some other results, there’s inclusion of tags which are not intentional i.e. they were not present in the input but still somehow appear in the output.

[['Covered in a blue color, this mini length dress will brighten up your summer season wardrobe, styled in a holiday silhouette and SHAPE with a holiday sleeve of a minimum of 10 inch',
  'Printed all over with blue color mini length dress, this holiday dress is designed in a holiday silhouette that falls to a HEM',
  'Covered in a blue color, this mini length dress will brighten up your summer season wardrobe, styled in a holiday silhouette and SHAPE with a holiday sleeve and sleeve holiday']]

If we see in the above prompt, we provided the color of dress, it’s length and occasion; however, season was added in along with other tags just out of the blue, why would this generation be uncontrolled when we’re providing the model with properly curated data without extraneous tags in the output?

Also in some results, one attribute will be rampantly repeated and the description would then make no sense for eg. in the following sentence, length is repeatedly predicted as a word so many times in the description…

[['Put some swing in this mini length length length length length length length length dress',
'Put some swing in this mini length length length length length length length length length length formal dress']]

I am just beginning my journey with transformer based NLP and would appreciate if someone could clarify why this might be happening and how could it be circumvented.

Thanks & Regards,
Vinayak.