what is the different?
which method is good?
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
and
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device_map="auto",
use_triton=False,
quantize_config=None,
)
return model, tokenizer
def load_full_model(model_id, model_basename, device_type, logging):
"""
Load a full model using either LlamaTokenizer or AutoModelForCausalLM.
This function loads a full model based on the specified device type.
If the device type is 'mps' or 'cpu', it uses LlamaTokenizer and LlamaForCausalLM.
Otherwise, it uses AutoModelForCausalLM.
Parameters:
- model_id (str): The identifier for the model on HuggingFace Hub.
- model_basename (str): The base name of the model file.
model = AutoModelForCausalLM.from_pretrained(model_id,
# quantization_config=quantization_config,
# low_cpu_mem_usage=True,
# torch_dtype="auto",
torch_dtype=torch.bfloat16,
device_map="auto",
cache_dir="./models/")
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="./models/")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=MAX_NEW_TOKENS,
temperature=0.2,
# top_p=0.95,
repetition_penalty=1.15,
generation_config=generation_config,
)
local_llm = HuggingFacePipeline(pipeline=pipe)
hi @alice86
it’s concerned with degrees of abstraction. There is no good or bad. It’s about being easy
Pipeline takes care of some of the details under the hood for you. It’s better to stick with the simple one until you reach the limits.
I already download the model and directly use the path.
model_id = "./Llama3"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
It show the error
ValueError: You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead.
Q1.
What is the define for You are trying to offload the whole model to the disk
If I use the model_id=meta-llama/Meta-Llama-3-8B
.
It will automatically download the folder models–meta-llama–Meta-Llama-3-8B on ./cache
So the download model is also the case that offload the whole model to the disk ?
Q2. Because I saw the offload the whole model.
what is the onlineload the model code