I first fine tune a model using qlora, similar to this notebook here. Google Colab
I then save it using trainer.push_to_hub()
Next, I open a new notebook with this code here:
model_name = “Leon68/falcon-7b-openassistant”
#“tiiuae/falcon-7b-instruct”
model = AutoModelForCausalLM.from_pretrained(model_name,device_map=‘auto’,trust_remote_code=True)
model_name = “tiiuae/falcon-7b-instruct”
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
input_text = “teach me how to fly”
input_ids = tokenizer(input_text, return_tensors=“pt”).input_ids.to(“cuda”)
next_input = input_ids
max_length = 80 # Change this to your desired output length. Too long could cause an OOM Out of Memory error!
current_length = input_ids.shape[1]
while True:
if current_length >= max_length: # Check if we’ve reached the length limit
break
output = model(next_input)
next_token_logits = output.logits[:, -1, :]
next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(0)
print(tokenizer.decode(next_token_id[0].cpu().tolist(), skip_special_tokens=True), end='', flush=True)
next_input = torch.cat([next_input, next_token_id.to("cuda")], dim=-1)
current_length += 1
if next_token_id[0].item() == tokenizer.eos_token_id:
break
And when I run inference, it spits out nonsense:
Vel Educational这样"… Fond visitors bangs ClassesINSenciasoney Bills analyzed ll quere Fond表 Fond QB lips Sociology asegur betray Killer birthplace geb"…表 Fond表
Any idea what is going on?