Unnecesarry output generated form LLama3-Instruct model CausalLM

AbuSayed1 · November 14, 2024, 5:25am

After finetunned LLama3-Instruct model I use this code to generate text

example = prompt.format("You are mimicking a linux server. Respond with what the terminal would respond when a command given. I want you to only reply with the terminal outputs inside one unique code block and nothing else. Do not write any explanations. Do not type any commands unless I instruct you to do so.", "echo 'Hellow world'","")
input_ids = tokenizer.encode(example, return_tensors="pt")
output = trainer.model.generate(
    input_ids, 
    max_length=128,
    num_return_sequences=1,  # Adjust for multiple generations per input if needed
    no_repeat_ngram_size=2,  # To prevent repetition
    top_k=50,  # Top-k sampling
    top_p=0.95,  # Nucleus sampling
    do_sample=True,  # Sampling for diversity
    temperature=0.7  # Control creativity
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)[len(example):]
print('text ', example)
print('generated text', generated_text)

Generated text output:
generated text ```
Helloworld


### End. ###

### New Command:
###? ### 

Please enter a new command. You can give a general command like `ls`, `cd`, or `mkdir`, a system command `echo`, etc. or a

How do I remove that unnecessary lines in generated text?

AbuSayed1 · November 17, 2024, 3:13am

I received following warning. ‘The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input’s attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:128001 for open-end generation.’

I fix it by changing code into

import re
example = prompt.format("You are mimicking a linux server. Respond with what the terminal would respond when a command given. I want you to only reply with the terminal outputs inside one unique code block and nothing else. Do not write any explanations. Do not type any commands unless I instruct you to do so.", "echo 'Hellow world'","")
inputs = tokenizer(
        example, 
        return_tensors="pt", 
        padding=True, 
        truncation=True, 
        max_length=256  # Adjust max_length as needed
    )
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

# Generate output
output = trainer.model.generate(
    input_ids, 
    attention_mask=attention_mask,  # Pass attention mask
    max_length=256,  # Adjust max length as required
    num_return_sequences=1,  # Adjust for multiple generations per input if needed
    no_repeat_ngram_size=2,  # To prevent repetition
    top_k=50,  # Top-k sampling
    top_p=0.95,  # Nucleus sampling
    do_sample=True,  # Sampling for diversity
    temperature=0.7,  # Control creativity
    pad_token_id=tokenizer.eos_token_id  # Explicitly set the pad token ID
)

# Decode and print the result
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)[len(example):]
print('text:', example)
print('generated text:', generated_text)

Topic		Replies	Views
Llama 2 repeats its prompt as output without answering the prompt 🤗Transformers	3	3593	September 30, 2024
Unisloth 4-bit Llama models acting weirdly when used in a Function Beginners	0	165	May 8, 2024
Making llama text generation, deterministic Models	1	9769	August 16, 2023
Text generation using LLAMA3 Beginners	0	830	July 24, 2024
Understanding Output of `PreTrainedModel.forward` Beginners	2	1909	February 12, 2024

Unnecesarry output generated form LLama3-Instruct model CausalLM

Related topics