Inference mistral-7b instruct fully offline in Local machin

abbas381366 · April 27, 2024, 9:10am

hello
i have a pc with spec :
memory : 128GB
GPU : RTX4090 24GB
CPU : Core™ i9-14900K

i download model file and reference in python code

def MistralInstruct(instruct,asistance,userprompt):
    device = "cuda" #cuda or cpu the device to load the model onto
    modelPath = "/mnt/f/AI/AI Models/mistralai__Mistral-7B-Instruct-v0.2"
    model = AutoModelForCausalLM.from_pretrained(modelPath)
    tokenizer = AutoTokenizer.from_pretrained(modelPath)

    messages = [
        {"role": "user", "content": instruct},
        {"role": "assistant", "content": asistance},
        {"role": "user", "content": userprompt}
    ]

    encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

    model_inputs = encodeds
    # model_inputs = encodeds.to(device)
    # model.to(device)
    
    model_inputs = encodeds

    generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
    decoded = tokenizer.batch_decode(generated_ids)
    return decoded[0]

but when i want use this code system message : Loading checkpoint shards
and want to download some files …
i dont have internet and stuck on it …
how i fully offline that ?

Topic		Replies	Views
Running Mistral-7B-Instruct-v0.2 on multiple GPUs Beginners	4	4277	March 13, 2024
Model Shards Checkpoint GeForce 4070 TI 12GB Models	0	174	February 20, 2024
torch.nn.DataParallel Mistral-7B-Instruct RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! Beginners	1	64	August 20, 2024
Poor performance from Mistral-7B-Instruct-v0.1 Beginners	1	1546	March 1, 2024
How can I make use of GPU manually to run inference faster? 🤗Transformers	3	31	April 22, 2025

Inference mistral-7b instruct fully offline in Local machin

Related topics