ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object

I am getting following error. I dont understand to resolve this my model is logged in mlflow. I am trying to load it locally in my colab.

Model Logged to MLFLOW code:

# Enable native 2x faster inference
    FastLanguageModel.for_inference(model)
    inference_pipeline = hf_pipeline("text-generation", model=model, tokenizer=tokenizer)

    # Log the inference pipeline to MLflow
    mlflow.transformers.log_model(
        transformers_model=inference_pipeline,
        artifact_path="lora_model_unsloth_PEFTllama3.2_3B",
        registered_model_name="lora_model_unsloth_PEFTllama3.2_3B"
    )

ML Load in Colab code:

import mlflow

# Set DagsHub as the tracking URI
mlflow.set_tracking_uri("https://dagshub.com/kushwanthkc/lora_model_unsloth_PEFTllama3.2_3B.mlflow")

# Use the full run ID as the model URI
logged_model = 'runs:/e73e9aba105c47d2bfdd58ef135f1478/lora_model_unsloth_PEFTllama3.2_3B'  # Assuming this path is correct on DagsHub

# Load model as a PyFuncModel.
# Removing the 'device' argument as it's not supported by mlflow.pyfunc.load_model()
loaded_model = mlflow.pyfunc.load_model(logged_model)

error loaded_model:

ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.
1 Like

It seems to be an unresolved issue in mlflow.

1 Like

Haha I just commented over there… I am getting too many road blocks :tired_face:

1 Like

Debugged it I finally confirms PEFT models are not supported by MLflow

Device set to use cuda:0
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FalconMambaForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GitForCausalLM', 'GlmForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'Mamba2ForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MllamaForCausalLM', 'MoshiForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MusicgenMelodyForCausalLM', 'MvpForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhimoeForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RecurrentGemmaForCausalLM', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'WhisperForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM', 'ZambaForCausalLM'].

Just a question what other Kind of LLMOps tools being using works well with PEFT? any opensource?

1 Like

looks like I had workaround the model ‘PeftModelForCausalLM’ is not supported for text-generation is not related to MLFlow. That is tag I am using text-generation in pipeline to inference.

Do you know what is the right tag that i can pass here?

inference_pipeline = hf_pipeline("text-generation", model=

# Use the pipeline for inference
output = inference_pipeline(""" Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
complete the sequence 

### Input:
1, 1, 2, 3

### Response:""")
print(output)

Output response looks good:

[{'generated_text': ' Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\ncomplete the sequence \n\n### Input:\n1, 1, 2, 3\n\n### Response: \n1, 1, 2, 3, 5, 8, 13'}]

here is the final solution and work around:

  • Inference code load model from MLFlow:
import mlflow

# Set DagsHub as the tracking URI
mlflow.set_tracking_uri("https://dagshub.com/kushwanthkc/lora_model_unsloth_PEFTllama3.2_3B.mlflow")

# Use the full run ID as the model URI
logged_model = 'runs:/ba803f221c824fb59123e837e1377e70/lora_model_unsloth_PEFTllama3.2_3B'  # Assuming this path is correct on DagsHub

# Load the model components
loaded_components = mlflow.transformers.load_model(model_uri=logged_model, return_type="components")

# Create an inference pipeline
inference_pipeline = hf_pipeline("text-generation", model=loaded_components['model'], tokenizer=loaded_components['tokenizer'])

# Use the pipeline for inference
output = inference_pipeline(""" Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
complete the sequence 

### Input:
1, 1, 2, 3

### Response:""")
print(output)
  • log model to MLFlow:
    # Assuming 'model' is your PeftModelForCausalLM instance and 'tokenizer' is your tokenizer 
    components = { "model": model, "tokenizer": tokenizer, }
    # Log the inference pipeline to MLflow
    mlflow.transformers.log_model(
        transformers_model=components,
        artifact_path="lora_model_unsloth_PEFTllama3.2_3B",
        registered_model_name="lora_model_unsloth_PEFTllama3.2_3B"
    )
1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.