HuggingFacePipeline Llama2 load_in_4bit from_model_id the model has been loaded with `accelerate` and therefore cannot be moved to a specific device

Juand-David · August 29, 2023, 9:48am

Hello
Trying to load Llama2 model with the HuggingFacePipeline
In an AWS g5.4xlarge (1 Gpu-16Cpu-64GoCpu-24GoGpu) instance.
with the code below I have the below error.

I tried also
-With other type of instances
-Specifying 1 GPU
-Removing device-Auto
and I have the same errors.

I tried also just loading with
model = AutoModelForCausalLM.from_pretrained(“meta-llama/Llama-2-7b-chat-hf”, device_map=“auto”)
and this loads the model well.
I think is the pipeline issue or link between langchain & accelerate?

Any Idea to use the pipeline?
thanks
JD

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
import tiktoken
from langchain import HuggingFacePipeline
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

llm = HuggingFacePipeline.from_model_id(
model_id=“meta-llama/Llama-2-7b-chat-hf”,
task=“text-generation”,
model_kwargs={
“temperature”: 0,
“max_length”: 2048,
“torch_dtype”: torch.bfloat16,
“device_map”: “auto”,
“load_in_4bit”: True

)

/home/ml-app/DATA_DESIGN/code-envs/python/py_310_sample_llm/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: do_sample is set to False. However, temperature is set to 0 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00, 2.31s/it]
WARNING:langchain.llms.huggingface_pipeline:Device has 1 GPUs available. Provide device={deviceId} to from_model_id to use availableGPUs for execution. deviceId is -1 (default) for CPU and can be a positive integer associated with CUDA device id.

ValueError: The model has been loaded with accelerate and therefore cannot be moved to a specific device. Please discard the device argument when creating your pipeline object.

Just in case more instance GPU info after executing the line.Before GPUs memory are free.

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 16298 C …python/py_310_sample_llm/bin/python 7492MiB |
| 0 N/A N/A 19772 C …python/py_310_sample_llm/bin/python 2624MiB |
±--------------------------------------------------------------------------------------+

marcsun13 · September 5, 2023, 1:10pm

It’s probably the pipeline on langchain side. Just like you said, loading with model = AutoModelForCausalLM.from_pretrained(“meta-llama/Llama-2-7b-chat-hf”, device_map=“auto”) works. There is nothing we can do on our side.

theotherkhan · October 9, 2024, 2:49pm

Juand-David, were you able to solve this issue? I’m facing the same error when I try to load the model into MLflow, and have taken out device-auto or any device specifications. Thanks!

Topic		Replies	Views
ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object 🤗Accelerate	5	256	January 20, 2025
AutoModelForCausalLM and transformers.pipeline Beginners	2	634	August 29, 2024
`text-generation` `Pipeline` prohibitively slow to load, even with cached model 🤗Transformers	1	4383	May 23, 2023
Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes 🤗Transformers	22	49730	December 19, 2024
AutoModelForCausalLM() to HuggingFaceLLM Beginners	2	2981	October 4, 2024

HuggingFacePipeline Llama2 load_in_4bit from_model_id the model has been loaded with `accelerate` and therefore cannot be moved to a specific device

Related topics