How does the text-generation pipeline know the special stop token?

VitalContribution · June 5, 2024, 7:01am

I use the Llama2 model currently which has the stop token </s>.

I tried to change the stop token so that the pipeline would continue to generate regardless of the model predicting the </s> token.

I tried to change:

model.config.eos_token_id = <someRandomNumber>
# or
tokenizer.eos_token = "somethingelse"

but the pipeline still seems to know that it should stop at </s>

So I’m asking myself:
“How does the the Pipeline (Text-Generation) know what the correct stop token is and does it get the stop token from?”

RaushanTurganbay · June 5, 2024, 10:44am

Hey! You have to change it also in model’s generation config. Try out the below code

model.generation_config.eos_token_id = None
model.config.eos_token_id = None

VitalContribution · June 5, 2024, 6:26pm

Thank you your response! I had no idea about the generation config.
The thing is actually my pipeline does not stop to generate.

The following print statements print all </s> (the correct stop token).

print("1. ", tokenizer.decode(tokenizer.eos_token_id))
print("2. ", tokenizer.decode(model.config.eos_token_id))
print("3. ", tokenizer.decode(model.generation_config.eos_token_id))

However, when I create and call the pipeline:

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)

# Run text generation pipeline with our next model
while True:
    user_input = "write an sql statement"
    prompt = f"<s>[INTS]{user_input}[/INST]"
    result = pipe(prompt)
    print(result[0]['generated_text'])

It outputs:

<s>[INTS]write an sql statement[/INST] SELECT * FROM table_name WHERE name = "mario"</s>\n\nWhat does the SQL statement mean?\n\nIn this SQL...

You can see that that the correct stop token is set in every single config and the pipeline still ignores the </s> and continues.

RaushanTurganbay · June 6, 2024, 4:57am

Ah sorry, I thought you want to generate even if there is stop token. Then you shouldn’t be setting them to None as I showed above.

If the model is still generating beyond eos_token set in the model.generation_config.eos_token_id, can you check if the generation config has forced_eos_tokens or min_length. Also it would be easier if you provide a reproducible code with the model checkpoint you’re using

VitalContribution · June 6, 2024, 6:30am

No, I’m sorry, this was on me!
The script is about a fine tuned LLM (with LoRA). It tries to predict the SQL instruction from a user question and the database table.

Here is the entire script: Google Colab

Let me give you a quick overview:

Loading the base model and merging it with my trained LORA Adapter.
Showing all model configs
Showing all eos token ids (decoded)
Trying out the pipeline with the config files
Trying out the pipeline and setting specifically the </s> stop token (also had no effect)

RaushanTurganbay · June 6, 2024, 12:40pm

Thanks a lot for sharing the notebook. Unfortunately I still can’t reproduce it on ‘main’. I am using an arbitrary adapter as I don’t have access to the one you had in Colab, so my wild guess is that there’s smth in the adapter config as the version in Colab is the latest one.

Let me know if you can provide the whole script with the adapter, and I will try to find the bug!

VitalContribution · June 6, 2024, 6:06pm

No, thank you so much for even trying to help me.
But please don’t spend too much time on it but if you can not reproduce it with these script I will go crazy.

My trained lora adapter: SQLFormerLlama2_Adapter - Google Drive

The entire training script: Google Colab
(If you want to try out the training script you just have to let it run for maybe 3-5min and then you can stop the cell. By then the model will have learned the rough desired structure and will already fail to stop at the stop token)

RaushanTurganbay · June 10, 2024, 5:32am

@VitalContribution hey, I ran the notebook with your adapter and I found the error. The generated “” tokens are not EOS, i.e. the model is generating ids: [829, 29879, 29958]`which are together decoded as “”.

I can’t say right now what went wrong with training, or how to train properly since I’m prioritizing other tasks. A workaround here will be to use a stop_strings as a kwarg to pipeline, which stops when a custom string is generated. See docs for more details

system · June 11, 2024, 11:03am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Endpoint not returning stop token on mistral models Inference Endpoints on the Hub	2	4373	October 27, 2023
Stopping generation before max_new_tokens 🤗Transformers	0	791	June 1, 2023
GPT2 finetuned with eos token will never yield eos token during generation Beginners	6	3363	April 12, 2024
Stopping criteria BLOOM Models	2	1656	December 9, 2022
Controlled Text Generation 🤗Transformers	2	2585	March 26, 2022

How does the text-generation pipeline know the special stop token?

Related topics