Uploading and Download Model Errors

I first fine tune a model using qlora, similar to this notebook here. Google Colab

I then save it using trainer.push_to_hub()

Next, I open a new notebook with this code here:

model_name = “Leon68/falcon-7b-openassistant”
#“tiiuae/falcon-7b-instruct”

model = AutoModelForCausalLM.from_pretrained(model_name,device_map=‘auto’,trust_remote_code=True)

model_name = “tiiuae/falcon-7b-instruct”
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

input_text = “teach me how to fly”
input_ids = tokenizer(input_text, return_tensors=“pt”).input_ids.to(“cuda”)

next_input = input_ids
max_length = 80 # Change this to your desired output length. Too long could cause an OOM Out of Memory error!
current_length = input_ids.shape[1]

while True:
if current_length >= max_length: # Check if we’ve reached the length limit
break

output = model(next_input)
next_token_logits = output.logits[:, -1, :]
next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(0)
print(tokenizer.decode(next_token_id[0].cpu().tolist(), skip_special_tokens=True), end='', flush=True)

next_input = torch.cat([next_input, next_token_id.to("cuda")], dim=-1)

current_length += 1

if next_token_id[0].item() == tokenizer.eos_token_id:
    break

And when I run inference, it spits out nonsense:

Vel Educational这样"… Fond visitors bangs ClassesINSenciasoney Bills analyzed ll quere Fond表 Fond QB lips Sociology asegur betray Killer birthplace geb"…表 Fond表

Any idea what is going on?