Unnecesarry output generated form LLama3-Instruct model CausalLM

After finetunned LLama3-Instruct model I use this code to generate text

example = prompt.format("You are mimicking a linux server. Respond with what the terminal would respond when a command given. I want you to only reply with the terminal outputs inside one unique code block and nothing else. Do not write any explanations. Do not type any commands unless I instruct you to do so.", "echo 'Hellow world'","")
input_ids = tokenizer.encode(example, return_tensors="pt")
output = trainer.model.generate(
    input_ids, 
    max_length=128,
    num_return_sequences=1,  # Adjust for multiple generations per input if needed
    no_repeat_ngram_size=2,  # To prevent repetition
    top_k=50,  # Top-k sampling
    top_p=0.95,  # Nucleus sampling
    do_sample=True,  # Sampling for diversity
    temperature=0.7  # Control creativity
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)[len(example):]
print('text ', example)
print('generated text', generated_text)

Generated text output:
generated text ```
Helloworld


### End. ###

### New Command:
###? ### 

Please enter a new command. You can give a general command like `ls`, `cd`, or `mkdir`, a system command `echo`, etc. or a

How do I remove that unnecessary lines in generated text?
1 Like

I received following warning. ‘The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input’s attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:128001 for open-end generation.’

I fix it by changing code into

import re
example = prompt.format("You are mimicking a linux server. Respond with what the terminal would respond when a command given. I want you to only reply with the terminal outputs inside one unique code block and nothing else. Do not write any explanations. Do not type any commands unless I instruct you to do so.", "echo 'Hellow world'","")
inputs = tokenizer(
        example, 
        return_tensors="pt", 
        padding=True, 
        truncation=True, 
        max_length=256  # Adjust max_length as needed
    )
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

# Generate output
output = trainer.model.generate(
    input_ids, 
    attention_mask=attention_mask,  # Pass attention mask
    max_length=256,  # Adjust max length as required
    num_return_sequences=1,  # Adjust for multiple generations per input if needed
    no_repeat_ngram_size=2,  # To prevent repetition
    top_k=50,  # Top-k sampling
    top_p=0.95,  # Nucleus sampling
    do_sample=True,  # Sampling for diversity
    temperature=0.7,  # Control creativity
    pad_token_id=tokenizer.eos_token_id  # Explicitly set the pad token ID
)

# Decode and print the result
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)[len(example):]
print('text:', example)
print('generated text:', generated_text)
1 Like