Dears,
what is the difference between these 2 loading statements st1 and st2
which is faster …any help is highly appreciated
st1:
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
model_id = “meta-llama/Llama-3.2-90B-Vision-Instruct”
model = MllamaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map=“auto”,
)
processor = AutoProcessor.from_pretrained(model_id)
st2:
from transformers import AutoProcessor, AutoModelForPreTraining
processor = AutoProcessor.from_pretrained(“meta-llama/Llama-3.2-90B-Vision-Instruct”)
model = AutoModelForPreTraining.from_pretrained(“meta-llama/Llama-3.2-90B-Vision-Instruct”)
1 Like
Ignoring detailed options and differences in models, Auto type classes will select and return individual classes appropriate for each model, so ideally, the result will be the same.
However, not everything can be determined automatically and appropriately, so if the model information is completely known and determined, it is a good idea to specify it explicitly. That’s what I do.
I dont know if “using torch_dtype=torch.bfloat16” will use my GPU during dowloading the model… and
model = AutoModelForPreTraining.from_pretrained(“meta-llama/Llama-3.2-90B-Vision-Instruct”) will use traditional CPU during the loading
Any help please
1 Like
from transformers import AutoProcessor, AutoModelForPreTraining
import torch
hf_token = "hf_***********"
processor = AutoProcessor.from_pretrained(“meta-llama/Llama-3.2-90B-Vision-Instruct”, token=hf_token)
model = AutoModelForPreTraining.from_pretrained(“meta-llama/Llama-3.2-90B-Vision-Instruct”, device_map="auto", torch_dtype=torch.bfloat16, token=hf_token).to("cuda")
This should work, but is there really enough VRAM in the GPU…? My PC doesn’t even have enough RAM…