I’m encountering an issue with the fine-tuned unsloth/Qwen2.5-1.5B
model, where the output includes unexpected exclamation marks (!
) during text generation.
Process Followed:
- Fine-Tuning:
After fine-tuning, I received the following files:
merges
,tokenizer
,training_args.bin
,vocab
,adapter_config
,adapter_model.safetensors
,added_tokens
,README
,special_tokens_map
,tokenizer_config
.
- Error on Generation:
When I attempted to generate text, I encountered an error becauseconfig.json
andmodel.safetensors
were missing. I solved this by renamingadapter_model.safetensors
tomodel.safetensors
and addedconfig.json
from a Hugging Face meta model.
Content ofconfig.json
:
{
“_name_or_path”: “/home/azureuser/CodeReview/Qwen2.5-finetuned_without_BSB/Qwen2.5-finetuned_without_BSB/”,
“architectures”: [“Qwen2ForCausalLM”],
“attention_dropout”: 0.0,
“bos_token_id”: 151643,
“eos_token_id”: 151643,
“hidden_act”: “silu”,
“hidden_size”: 1536,
“initializer_range”: 0.02,
“intermediate_size”: 8960,
“max_position_embeddings”: 32768,
“max_window_layers”: 28,
“model_type”: “qwen2”,
“num_attention_heads”: 12,
“num_hidden_layers”: 28,
“num_key_value_heads”: 2,
“quantization_config”: {
“_load_in_4bit”: false,
“_load_in_8bit”: false,
“quant_method”: “bitnet”
},
“rms_norm_eps”: 1e-06,
“rope_theta”: 1000000.0,
“vocab_size”: 151936
}
Code:
Here’s my current code for generating responses:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
app = FastAPI()
Define the input structure
class CodeReviewInput(BaseModel):
diff: str
Load your locally saved fine-tuned model and tokenizer
model_path = “/home/azureuser/CodeReview/Qwen2.5-finetuned_without_BSB/Qwen2.5-finetuned_without_BSB/”
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype=torch.float32, device_map=“cpu”
)
torch.set_num_threads(4)
@app.post(“/code_review/”)
async def code_review(input_data: CodeReviewInput):
diff = input_data.diff.strip()
if not diff:
raise HTTPException(status_code=400, detail=“Input ‘diff’ cannot be empty.”)
alpaca_prompt = (
"### Instruction:\n{0}\n\n"
"### Input:\n{1}\n\n"
"### Response:\n"
)
prompt = alpaca_prompt.format(
"Review the code changes and provide feedback.",
diff
)
inputs = tokenizer([prompt], return_tensors="pt", truncation=True, max_length=512)
try:
outputs = model.generate(
**inputs,
max_new_tokens=256,
no_repeat_ngram_size=2,
temperature=0.7,
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
response_start = "### Response:"
if response_start in generated_text:
response = generated_text.split(response_start, 1)[-1].strip()
else:
response = "Error: Model output incomplete or malformed."
return {"response": response}
except torch.cuda.OutOfMemoryError:
raise HTTPException(
status_code=500, detail="Model ran out of memory. Reduce input size."
)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Unexpected error: {str(e)}"
)
if name == “main”:
import uvicorn
uvicorn.run(app, host=“0.0.0.0”, port=8000)
- How can I address the model generating exclamation marks unexpectedly?
I would appreciate any insights on how to fix this issue.
Thank you