Hey guys,
I have the following code:
import time
import psutil
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
Initialisierung des Modells und des Tokenizers
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
model = LlamaForCausalLM.from_pretrained(‘C:\Users\sebas.ollama\models\manifests\registry.ollama.ai\library\llama3.1\8b’).to(device)
tokenizer = LlamaTokenizer.from_pretrained(‘C:\Users\sebas.ollama\models\manifests\registry.ollama.ai\library\llama3.1\8b’)
Funktion, die die Latenzzeit misst
def measure_latency():
start_time = time.time()
inputs = tokenizer(“Explain the complex numbers”, return_tensors=“pt”).to(device)
outputs = model.generate(**inputs)
print(outputs)
end_time = time.time()
latency = end_time - start_time
return latency
Funktion, die die Antwortzeit misst
def measure_response_time():
start_time = time.time()
inputs = tokenizer(“Explain the complex numbers”, return_tensors=“pt”).to(device)
outputs = model.generate(**inputs)
print(outputs)
end_time = time.time()
response_time = end_time - start_time
return response_time
Funktion, die die CPU-Auslastung misst
def measure_cpu_usage():
cpu_usage = psutil.cpu_percent()
return cpu_usage
Funktion, die die RAM-Auslastung misst
def measure_ram_usage():
ram_usage = psutil.virtual_memory().percent
return ram_usage
Funktion, die die CUDA bzw. HIPP-Auslastung misst
def measure_cuda_hipp_usage():
cuda_hipp_usage = torch.cuda.memory_allocated() if torch.cuda.is_available() else 0
return cuda_hipp_usage
Beispielaufrufe der Funktionen
latency = measure_latency()
response_time = measure_response_time()
cpu_usage = measure_cpu_usage()
ram_usage = measure_ram_usage()
cuda_hipp_usage = measure_cuda_hipp_usage()
Ausgabe der gemessenen Werte
print(“Latenzzeit:”, latency, ‘s’)
print(“Antwortzeit:”, response_time, ‘s’)
print(“CPU-Auslastung:”, cpu_usage, ‘%’)
print(“RAM-Auslastung:”, ram_usage, ‘%’)
print(“CUDA bzw. HIPP-Auslastung:”, cuda_hipp_usage, ‘MB’)
I think my code is ok, but I get the following Error:
Traceback (most recent call last):
File “c:\Users\sebas.ollama\models\manifests\registry.ollama.ai\library\llama3.1\8b\Testscript_RAG-System.py”, line 9, in
model = LlamaForCausalLM.from_pretrained(‘C:\Users\sebas.ollama\models\manifests\registry.ollama.ai\library\llama3.1\8b’).to(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\sebas\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\transformers\modeling_utils.py”, line 3735, in from_pretrained
with safe_open(resolved_archive_file, framework=“pt”) as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooSmall
I downloaded the Llama 3.1-8B model from the official homepage and put the safetensor-files from HF in the same folder. Is this the problem?
Please help me.