Logging finetuned model using transformers mlflow flavor in azure

I am working in azure trying to run a job that calls a training notebook. I can train and even evaluate my model just fine within said notebook but when I try to log it at the end it throws errors. The error that I am seeing is

[0;31mHFValidationError[0m: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './models/finetuned_llama3/'. Use repo_type argument if needed.

From some research it seems that this means that it is trying to pull straight from hugging face based on my artifact path. I know that the the model exists where I am referencing because I am logging the directory and can see it exists there. I have tried setting arguments and environment variables telling it not to look for a repo with no success.

Here is what my logging logic looks like:

job_model_path = 'models/finetuned_llama3'

peft_model = AutoPeftModelForCausalLM.from_pretrained(
    job_model_path, 
    config=LoraConfig(
        r=lora_config_dict["r"],
        lora_alpha=lora_config_dict["lora_alpha"],
        target_modules=lora_config_dict["target_modules"],
        lora_dropout=lora_config_dict["lora_dropout"],
        bias=lora_config_dict["bias"],
        task_type=lora_config_dict["task_type"]
    ), 
    device_map="cuda"
)
peft_model.model.config.quantization_config.use_exllama = True
peft_model.model.config.quantization_config.exllama_config = {"version": 2}

mlflow.transformers.log_model(
    transformers_model={"model": peft_model, "tokenizer": tokenizer},
    artifact_path="finetuned_llama3",  # Ensure the artifact path is correct
    registered_model_name="huggingface-finetuned-model",
    task="text-generation"  # Specify the task type here
)

When I try to log the model in this manner in an ML studio notebook it works as expected so it’s something with how we configure the job

Being that the mlflow flavor is relatively new it has been hard to find a ton of stuff out there about it. I have tried to find other posts / forums about this issue but haven’t found anything that was helpful. GPT and Copilot seem to have no clue how to solve my issue either.

I’ve seen people say that my artifact path cannot look like a full URL so I have changed that variable many times from full URLs to relative ones. I have also played around with my ‘transformers_model’ argument inputs from referencing the objects to just inputting the path.

I am expecting this to log a model to the azure model registry.

For reference this is the model we are finetuning: (astronomer/Llama-3-8B-Instruct-GPTQ-8-Bit · Hugging Face)

1 Like

Like this?

#job_model_path = 'models/finetuned_llama3'
job_model_path = './models/finetuned_llama3'

peft_model = AutoPeftModelForCausalLM.from_pretrained(
    job_model_path, 
    local_files_only=True, # Added
    config=LoraConfig(

Appreciate the reply, but I am still getting the same error with the additional argument. I’m guessing it is an issue with where the model is being saved within the job. It isn’t recognizing it in the directory for some odd reason. I tried updating the packages to the newest versions available but that didn’t work either. If this is more of an azure specific question I can seek help on those forums instead.

1 Like

If this is more of an azure specific question I can seek help on those forums instead.

I think that’s possible. I also encounter a lot of errors in virtual machines like Colab and HF Spaces that I don’t encounter locally.

In particular, there are a lot of cases where (implicit) cache-related behavior is bad (trying to write to a directory with incorrect permissions, etc.), so in some cases you can avoid this by setting environment variables like HF_HOME yourself again. Also, the Transformers backend PyTorch has a lot of similar environment variables…

Also, this is a common problem in Python, but there is a tendency for things to be more stable if you simply change the names of directories or files. If there are things with the same name in the scope, the library may malfunction.

Gonna mark this as solved because I figured out the solution.

The issue seems to be that when working in an azure job it has issues when dealing with AutoPeftModelForCausalLM and by association I assume Peft models in general. It struggles to use the variable that you assign to the peft model with the error that I mentioned above. If you instead refer to the models location in the mlflow.transformers.log_model args you can solve the problem with some extra steps. Code here:

peft_model = AutoPeftModelForCausalLM.from_pretrained(
    'models/finetuned_llama3', 
    local_files_only=True,
    config=LoraConfig(
        r=lora_config_dict["r"],
        lora_alpha=lora_config_dict["lora_alpha"],
        target_modules=lora_config_dict["target_modules"],
        lora_dropout=lora_config_dict["lora_dropout"],
        bias=lora_config_dict["bias"],
        task_type=lora_config_dict["task_type"]
    ), 
    device_map="cuda"
)
peft_model.model.config.quantization_config.use_exllama = True
peft_model.model.config.quantization_config.exllama_config = {"version": 2}

with open("models/finetuned_llama3/config.json", "w") as f:
    json.dump(peft_model.config.to_dict(), f, indent=4)

mlflow.transformers.log_model(
    transformers_model='models/finetuned_llama3',
    artifact_path="models/finetuned_llama3",
    registered_model_name="huggingface-finetuned-model",
    task="text-generation",
    save_pretrained=True
)

The extra step you need to take is adding the config file from you peft model to the directory that your model is saved in. This is because the config file you need is an attribute of the peft mode but if not in the folder that your finetuned model is saved in. The log model statement complains about that so you need to add the config file to that folder (seen in my json.dump).

Hopefully if someone else has this issue I hope they find this thread.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.