I Fine-tuned a llama 7b on a custom dataset, The response from inference generation start good, then words start to connect with out space

** Here is my generation code below:**

!pip install -q -U transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q -U bitsandbytes einops sentencepiece

from peft import PeftModel
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig, BitsAndBytesConfig

model_id = “huggyllama/llama-7b”

config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type=“nf4”,
bnb_4bit_compute_dtype=torch.bfloat16
)

model = LlamaForCausalLM.from_pretrained(model_id,
load_in_4bit=True,
torch_dtype=torch.bfloat16,
quantization_config=config,
device_map=“auto”)

model = PeftModel.from_pretrained(model, “0sunfire0/Llama_7B_Test04”)

tokenizer = LlamaTokenizer.from_pretrained(model_id)

custom_prompt = “In what scenarios is creating a Treasury Notes ladder a waste?”

PROMPT =f’''Below is an instruction that describes a task. Write a response in detail that appropriately completes the request.

Instruction:

{custom_prompt}

Response:

‘’’

%%time

inputs = tokenizer(
PROMPT,
return_tensors=“pt”,
)
input_ids = inputs[“input_ids”].cuda()

generation_config = GenerationConfig(
temperature=0,
top_p=1.0,
typical_p=1.0,
repetition_penalty=2.0,
encoder_repetition_penalty=2.0,
top_k=40
)

print(“Generating…”)
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=False,
max_new_tokens=700,
)
for s in generation_output.sequences: print(tokenizer.decode(s))

Here is the response below:

Instruction:

In what scenarios is creating a Treasury Notes ladder a waste?

Response:

Creating a Treasury Notes (T-bill) ladder can be a waste of time and money if certain conditions are not met. Here’s when creating a T‑ bill ladder may NOT lead to desirable results:

  1. Incorrect Assumptions: If the investor makes incorrect assumptions about the returns or interest rates in the future, creating a Treasury Notes ladder could prove ineffective. For example:

Assuming that short‐term Treasury Notes will always provide higher yields than longer term Treasury Notes is an inaccurate assumption.

  1. Unrealistic Savings Goals: The savings goals for which a Treasury Notes ladder is created should align with the actual needsand risk toleranceof the invested party. Otherwise, creating a ladder that includes unnecessary sectors
    (such as long– Term Treasury Notes), while still meeting the stated goalssignificantly increases the cost without providing any additional value.

  2. Overlapping Ladder Structure: Having multiple Treasury Notes ladder structurescan result in wasted resources. It is betterto focus on creating one well designed ladder ratherthan attemptingmultiple strategies that overlap each other.

  3. Mismatched Liquidity Needs: When creating a Treasury Notes ladder, it is important toreflect upon the liquiditiessupporting the financial objectives. A ladder structurethat relies heavilyon illiquidthirdpartyfinancinginstruments suchas certificatesofthe United States Treasury (“COMT”)maynotalignwith the specificliquiddownloadneededfor the portfolio.

  4. Lack Of Diversification: An excessively concentratedtreasurysectordiverts theinvesteecurrently in a single sector. This runs therisksoflossif a significant event affectstheinvestment. To mitigatethis riskexposureit is recommended thatincreased diversificationshouldbe consideredwhen building a Treasury Notes ladder.

  5. Ignoring Credibility Issues: Treasury Securitiesare often perceivedas having a low default rate due tot he fact that theyreissued by the U.S. government. However,crediblenegative eventsrelatedtothesurysectorshaveoccasionaly occurred in the past. Whilethe likelihoodoffutureeventsremainlowcomparedtothewider credit marketpracticesuchascorporationsandsmall businessesespecially in times off economic stress.

  6. Failures in Risk Management: Treasury Notes ladderstrategiestypically incorporatemultiple factors related tomoney managementincludinginterestratevoladexpectationscostmanagement. Withoutappropriatelyconsidersafetycapitalconcernsin the formulation stageaswellas regular reviewsassuring the desiredlevelsoftomarketuncertaintyrelief.

  7. Conflicting Objectves: Creating a Treasury Notes ladder thatincludes both fixed­ incomeassetsasweelastheldasavariousotherassetclasseswithoutclearobjectiveschallengeshavingconflictsonal financemanagement prioritisestrategy.

I tried to use add_prefix_space=True parameter in the tokenizer, Didn’t work
I tried to use clean_up_tokenization_spaces=False parameter in the tokenizer, Didn’t work
This problem happen only with llama 7b, i used the same code for fine-tuning and generation on llama 13b and falcon 7b and there is no problem at all in the response

The issue has been fixed

how?
I have a model that just repeats what I say to him

1 Like

Could you share your generation notebook please ?

is the shared code right?