Hello everyone,
I followed this tutorial to fine-tune code-LLAMA on my dataset. I added
trainer.push_to_hub()
to push the model to HF (private model). When I use the code below for inference, it works
output_dir = "xxx"
base_model = "codellama/CodeLlama-7b-hf"
model = AutoModelForCausalLM.from_pretrained(
base_model,
#load_in_8bit=True,
torch_dtype=torch.float16,
device_map="auto",
)
#test
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")
model = PeftModel.from_pretrained(model, output_dir)
# model.config.use_cache =True
validation = json.load(open("data/Validaton.json"))
output_list = list()
for item in validation:
prompt = format_prompt(item)
model_input = tokenizer(prompt, return_tensors="pt").to("cuda")
model.eval()
with torch.no_grad():
completion = tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True)
# print(type(completion))
output_list.append((item["question"], completion))
but I need to be able to use the model for inference from another device that does not have the content stored in the output_directory.
So I tried to do the following:
base_model ="Smarneh/XXXX"
tokenizer = AutoTokenizer.from_pretrained("Smarneh/XXXX", use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(
base_model, use_auth_token=True)
for item in validation:
prompt = format_prompt(item)
model_input = tokenizer(prompt, return_tensors="pt").to("cuda")
model.eval()
with torch.no_grad():
completion = tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True)
# print(type(completion))
output_list.append((item["question"], completion))
and I got the following error:
raise EnvironmentError(
OSError: Smarneh/XXX does not appear to have a file named config.json. Checkout 'https://huggingface.co/Smarneh/XXX/main' for available files.
I googled the error message, but I could find a case similar to mine. I would appreciate any help with this.
Thanks a lot in advance