Advice on model type

Hi all,
I would like to know which initial model I could use in order to train a model capable of generating lists of numbers starting from a prompt in natural language.

Examples:
“Create a list of 3 prime numbers” → 1, 3, 5
“Create a list of 5 random numbers” → 5, 7, 19, 22, 24
“Create a list of 2 negative numbers” → -3, -5
etc.

From what I’ve read, the recommended models would be those based on “text-generation”, but before proceeding I would like to have further confirmation and maybe understand if eg. a GPT2 model might be fine or if there are others, even lighter.

Thank you so much :slight_smile:

Hey @00nc

Based on the examples you gave, where there is a clear separation between the input and the output, you can also consider “encoder-decoder” language models, like T5. These models have a text2text-generation tag.

One more note: with tasks where you want the model to follow an instruction, consider starting from an instruction-tuned model like Flan-T5 or BLOOMZ :slight_smile: These models have been fine-tuned to follow instructions and tend to behave better on this kind of tasks.

1 Like

Thank you very much!

Hi Joao,
I was able to load and query the “Flan-T5” model both locally and using the API.
I was also able to write a code that queries locally the model from a CSV file built like that:
input_text;target_text
“Write a list of numbers”;“1, 2, 3”
“Write 5 numbers”;“3, 6, 8, 2, 1”
etc.

But I wasn’t able to find any code sample to train the model, is there any?
I am using Python on win10.

Thank you.

Hey @00nc – have a look at our summarization transformers example, which can train a T5 model.

1 Like

Thank you again, I was able to run the script and to test it with the sample code

python examples/pytorch/summarization/run_summarization.py \
    --model_name_or_path t5-small \
    --do_train \
    --do_eval \
    --dataset_name cnn_dailymail \
    --dataset_config "3.0.0" \
    --source_prefix "summarize: " \
    --output_dir /tmp/tst-summarization \
    --per_device_train_batch_size=4 \
    --per_device_eval_batch_size=4 \
    --overwrite_output_dir \
    --predict_with_generate

Now my question is, how should I change this instruction to read the CSV file and use the FLAN-T5-SMALL model?
Of course I will change the model name in flan-t5-small but how to load the CSV and also if a source prefix is needed.
Thank you very much for your help!

Ok, I was able to train the model using the following code:

python run_summarization.py \
    --model_name_or_path flan-t5-small \
    --do_train \
    --do_eval \
    --train_file C:/mypath/train.csv \
    --validation_file C:/mypath/valid.csv \
    --source_prefix "summarize: " \
    --output_dir c:/myoutputpath \
    --overwrite_output_dir \
    --per_device_train_batch_size=4 \
    --per_device_eval_batch_size=4 \
    --predict_with_generate

I have created a CSV training file of 15 lines like this:

"write 3 numbers"; "6,12,18"
"write 3 numbers"; "4,8,12"
"write 3 numbers"; "17,34,51"
etc.

the validation file is actually the same with different 3-number lists.

The training finishes correctly but when I try to query the created model, this is what happens:

Input: write 3 numbers
Expected output:  "7,14,21"
Generated output: <pad>a slam dunk a slam dunk a slam dunk a slam dunk a slam dunk a sl

:joy: :joy: :joy: :joy: :joy:

Why? Should I just train it on a lot more lists or am I doing anything wrong?

Thank you!