ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object

KushwanthK · January 20, 2025, 8:56am

I am getting following error. I dont understand to resolve this my model is logged in mlflow. I am trying to load it locally in my colab.

Model Logged to MLFLOW code:

# Enable native 2x faster inference
    FastLanguageModel.for_inference(model)
    inference_pipeline = hf_pipeline("text-generation", model=model, tokenizer=tokenizer)

    # Log the inference pipeline to MLflow
    mlflow.transformers.log_model(
        transformers_model=inference_pipeline,
        artifact_path="lora_model_unsloth_PEFTllama3.2_3B",
        registered_model_name="lora_model_unsloth_PEFTllama3.2_3B"
    )

ML Load in Colab code:

import mlflow

# Set DagsHub as the tracking URI
mlflow.set_tracking_uri("https://dagshub.com/kushwanthkc/lora_model_unsloth_PEFTllama3.2_3B.mlflow")

# Use the full run ID as the model URI
logged_model = 'runs:/e73e9aba105c47d2bfdd58ef135f1478/lora_model_unsloth_PEFTllama3.2_3B'  # Assuming this path is correct on DagsHub

# Load model as a PyFuncModel.
# Removing the 'device' argument as it's not supported by mlflow.pyfunc.load_model()
loaded_model = mlflow.pyfunc.load_model(logged_model)

error loaded_model:

ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.

John6666 · January 20, 2025, 9:18am

It seems to be an unresolved issue in mlflow.

github.com/mlflow/mlflow

[BUG] Accelerate device mapping issue for mlflow.transformers.load_model()

opened 08:52PM - 16 Oct 24 UTC

theotherkhan

bug area/artifacts area/models

### Issues Policy acknowledgement - [X] I have read and agree to submit bug r…eports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md) ### Where did you encounter this bug? Databricks ### Willingness to contribute Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community. ### MLflow version 2.17.0 ### System information - **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: - **Python version**: 3.11.0rc1 - **yarn version, if running the dev UI**: ### Describe the problem I'm facing issues deploying a quantized model as part of a transformers pipeline into a serving endpoint in Databricks. I am able to successfully query and log the pipeline with ```pipeline.predict()``` and ```mlflow.transformers.log_model()```. Here's how I'm defining the pipeline: ``` quantization_config = BitsAndBytesConfig(load_in_8bit=True) model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path="mistralai/Mistral-7B-Instruct-v0.3", quantization_config=quantization_config, ) tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3") ## Define pipeline pipeline = pipeline(task="text-generation", model=model, tokenizer=tokenizer, #device=None ) ... with mlflow.start_run(run_name="______"): model_info=mlflow.transformers.log_model( transformers_model=pipeline, artifact_path="___________________", model_config=model_config, conda_env=conda_env, signature=signature, ) ``` However, when I try to load the model using `mlflow.transformers.load_model()`, I face the error below. Nowhere am I setting the device or device_map to a non-null value (I have even tried explicitly setting them to None when defining the pipeline, setting them after definition, changing defaults values through method overloading, etc.), but I'm still getting this error. Example values I get for the model's device and hf_device: ``` model.device = cuda:0 hf_device_map = {'': 0} ``` Since both are non-null values, the error is being raised. But I'm unsure how to set them to none, or avoid invoking this error. **Note: This error only comes up when using quantization that uses the accelerate library** ``` ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object. File <command-1215575308111671>, line 5 1 import mlflow 3 ## Validate model signature 4 #loaded_model = mlflow.pyfunc.load_model(model_uri = model_info.model_uri) ----> 5 loaded_model = mlflow.transformers.load_model(model_uri = model_info.model_uri) 6 print(loaded_model.metadata.signature) 8 ## Invoke model File /local_disk0/.ephemeral_nfs/envs/pythonEnv-c983bad3-04f9-4628-934c-fa156a03cce0/lib/python3.11/site-packages/transformers/pipelines/base.py:840, in Pipeline.__init__(self, model, tokenizer, feature_extractor, image_processor, modelcard, framework, task, args_parser, device, torch_dtype, binary_output, **kwargs) 837 hf_device_map = getattr(self.model, "hf_device_map", None) 839 if hf_device_map is not None and device is not None: --> 840 raise ValueError( 841 "The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please " 842 "discard the `device` argument when creating your pipeline object." 843 ) 845 if device is None: 846 if hf_device_map is not None: 847 # Take the first device used by `accelerate`. ``` ### Tracking information ```shell REPLACE_ME ``` ### Code to reproduce issue ``` import mlflow from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import pandas as pd from mlflow.models import infer_signature from mlflow.transformers import generate_signature_output ## Define quantized model and tokenizer quantization_config = BitsAndBytesConfig(load_in_8bit=True) model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path="mistralai/Mistral-7B-Instruct-v0.3", quantization_config=quantization_config, #device_map='auto', ) tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3") ## Define pipeline pipeline = pipeline(task="text-generation", model=model, tokenizer=tokenizer, #device_map='auto' device=None ) ## Define signature model_config = {"max_new_tokens": 1000, "max_length": 1000, "temperature": 0.0} input_example = "Hi! Whats your name?" output = generate_signature_output(pipeline, input_example) signature = infer_signature(input_example, output, params=model_config) ## Log model with mlflow.start_run(run_name="______________"): model_info=mlflow.transformers.log_model( transformers_model=pipeline, artifact_path="________________", model_config=model_config, signature=signature, ) ``` loaded_model = mlflow.transformers.load_model(model_uri = model_info.model_uri) ### Stack trace ``` ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object. File <command-1215575308111671>, line 5 1 import mlflow 3 ## Validate model signature 4 #loaded_model = mlflow.pyfunc.load_model(model_uri = model_info.model_uri) ----> 5 loaded_model = mlflow.transformers.load_model(model_uri = model_info.model_uri) 6 print(loaded_model.metadata.signature) 8 ## Invoke model File /local_disk0/.ephemeral_nfs/envs/pythonEnv-c983bad3-04f9-4628-934c-fa156a03cce0/lib/python3.11/site-packages/transformers/pipelines/base.py:840, in Pipeline.__init__(self, model, tokenizer, feature_extractor, image_processor, modelcard, framework, task, args_parser, device, torch_dtype, binary_output, **kwargs) 837 hf_device_map = getattr(self.model, "hf_device_map", None) 839 if hf_device_map is not None and device is not None: --> 840 raise ValueError( 841 "The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please " 842 "discard the `device` argument when creating your pipeline object." 843 ) 845 if device is None: 846 if hf_device_map is not None: 847 # Take the first device used by `accelerate`. ``` ### Other info / logs ``` REPLACE_ME ``` ### What component(s) does this bug affect? - [X] `area/artifacts`: Artifact stores and artifact logging - [ ] `area/build`: Build and test infrastructure for MLflow - [ ] `area/deployments`: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ] `area/docs`: MLflow documentation pages - [ ] `area/examples`: Example code - [ ] `area/model-registry`: Model Registry service, APIs, and the fluent client calls for Model Registry - [X] `area/models`: MLmodel format, model serialization/deserialization, flavors - [ ] `area/recipes`: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ] `area/projects`: MLproject format, project running backends - [ ] `area/scoring`: MLflow Model server, model deployment tools, Spark UDFs - [ ] `area/server-infra`: MLflow Tracking server backend - [ ] `area/tracking`: Tracking Service, tracking client APIs, autologging ### What interface(s) does this bug affect? - [ ] `area/uiux`: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ] `area/docker`: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ] `area/sqlalchemy`: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ] `area/windows`: Windows support ### What language(s) does this bug affect? - [ ] `language/r`: R APIs and clients - [ ] `language/java`: Java APIs and clients - [ ] `language/new`: Proposals for new client languages ### What integration(s) does this bug affect? - [ ] `integrations/azure`: Azure and Azure ML integrations - [ ] `integrations/sagemaker`: SageMaker integrations - [ ] `integrations/databricks`: Databricks integrations

KushwanthK · January 20, 2025, 9:56am

Haha I just commented over there… I am getting too many road blocks

KushwanthK · January 20, 2025, 10:09am

Debugged it I finally confirms PEFT models are not supported by MLflow

Device set to use cuda:0
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FalconMambaForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GitForCausalLM', 'GlmForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'Mamba2ForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MllamaForCausalLM', 'MoshiForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MusicgenMelodyForCausalLM', 'MvpForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhimoeForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RecurrentGemmaForCausalLM', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'WhisperForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM', 'ZambaForCausalLM'].

Just a question what other Kind of LLMOps tools being using works well with PEFT? any opensource?

KushwanthK · January 20, 2025, 10:19am

looks like I had workaround the model ‘PeftModelForCausalLM’ is not supported for text-generation is not related to MLFlow. That is tag I am using text-generation in pipeline to inference.

Do you know what is the right tag that i can pass here?

inference_pipeline = hf_pipeline("text-generation", model=

# Use the pipeline for inference
output = inference_pipeline(""" Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
complete the sequence 

### Input:
1, 1, 2, 3

### Response:""")
print(output)

Output response looks good:

[{'generated_text': ' Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\ncomplete the sequence \n\n### Input:\n1, 1, 2, 3\n\n### Response: \n1, 1, 2, 3, 5, 8, 13'}]

here is the final solution and work around:

Inference code load model from MLFlow:

import mlflow

# Set DagsHub as the tracking URI
mlflow.set_tracking_uri("https://dagshub.com/kushwanthkc/lora_model_unsloth_PEFTllama3.2_3B.mlflow")

# Use the full run ID as the model URI
logged_model = 'runs:/ba803f221c824fb59123e837e1377e70/lora_model_unsloth_PEFTllama3.2_3B'  # Assuming this path is correct on DagsHub

# Load the model components
loaded_components = mlflow.transformers.load_model(model_uri=logged_model, return_type="components")

# Create an inference pipeline
inference_pipeline = hf_pipeline("text-generation", model=loaded_components['model'], tokenizer=loaded_components['tokenizer'])

# Use the pipeline for inference
output = inference_pipeline(""" Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
complete the sequence 

### Input:
1, 1, 2, 3

### Response:""")
print(output)

log model to MLFlow:

    # Assuming 'model' is your PeftModelForCausalLM instance and 'tokenizer' is your tokenizer 
    components = { "model": model, "tokenizer": tokenizer, }
    # Log the inference pipeline to MLflow
    mlflow.transformers.log_model(
        transformers_model=components,
        artifact_path="lora_model_unsloth_PEFTllama3.2_3B",
        registered_model_name="lora_model_unsloth_PEFTllama3.2_3B"
    )

system · January 20, 2025, 10:20pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
HuggingFacePipeline Llama2 load_in_4bit from_model_id the model has been loaded with `accelerate` and therefore cannot be moved to a specific device 🤗Accelerate	2	7129	October 9, 2024
Unable to Load Fine-Tuned Florence-2 Model Checkpoint from Colab on Local Device Models	2	152	January 18, 2025
Pepeline error in Colab 🧨 Diffusers	1	18	March 31, 2025
How to load the finetuned model (merged weights) on colab? 🤗Transformers	1	1492	November 27, 2023
Download fails for llava-hf/bakLlava-v1-hf 🤗Transformers	0	235	July 1, 2024

ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object

Related topics