AutoModelForCausalLM and transformers.pipeline

what is the different?
which method is good?

pipeline = transformers.pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"

and

        model = AutoModelForCausalLM.from_pretrained(model_id,
                                            #  quantization_config=quantization_config,
                                            #  low_cpu_mem_usage=True,
                                            #  torch_dtype="auto",
                                             torch_dtype=torch.bfloat16,
                                             device_map="auto",
                                             cache_dir="./models/")

        tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="./models/")

    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_length=MAX_NEW_TOKENS,
        temperature=0.2,
        # top_p=0.95,
        repetition_penalty=1.15,
        generation_config=generation_config,
    )

    local_llm = HuggingFacePipeline(pipeline=pipe)

hi @alice86
it’s concerned with degrees of abstraction. There is no good or bad. It’s about being easy :slight_smile:

Pipeline takes care of some of the details under the hood for you. It’s better to stick with the simple one until you reach the limits.

I already download the model and directly use the path.

model_id = "./Llama3"

pipeline = transformers.pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)

It show the error

ValueError: You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead.

Q1.
What is the define for You are trying to offload the whole model to the disk
If I use the model_id=meta-llama/Meta-Llama-3-8B .
It will automatically download the folder models–meta-llama–Meta-Llama-3-8B on ./cache

So the download model is also the case that offload the whole model to the disk ?

Q2. Because I saw the offload the whole model.
what is the onlineload the model code