Hi everyone
I have a AMD Ryzen™ 7 PRO 4750U with Radeon™ Graphics × 16, so as far as I understood the CPU can run 16 threads simultaneously.
I used two scripts that I found. The first one starts like this:
===== STEP 1: Install Dependencies =====
pip install moondream # Install dependencies in your project directory
===== STEP 2: Download Model =====
Download model (1,733 MiB download size, 2,624 MiB memory usage)
Use: wget (Linux and Mac) or curl.exe -O (Windows)
wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int8.mf.gz
import moondream as md
from PIL import Imagemodel = md.vl(model=‘./moondream-2b-int8.mf.gz’) # Initialize model
image = Image.open(“tmp.jpg”) # Load image
encoded_image = model.encode_image(image) # Encode image (recommended for multiple operations)1. Caption any image (length options: “short” or “normal” (default))
caption = model.caption(encoded_image)[“caption”]
print(“Caption:”, caption)
When I run this, then I see a cpu usage of app. 1,600%. So all 16 threads seem to be in use.
But when I run this second script, only one thread is used and for whatever reason the cpu gets a lot hotter:
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import ImageFunction to process the model
def process_model(model, image, frage):
print("\nVAufgabe: " + frage)
print(model.query(image, frage)[“answer”])Initialize model and load image
model = AutoModelForCausalLM.from_pretrained(
“vikhyatk/moondream2”,
revision=“2025-01-09”,
trust_remote_code=True
)
image = Image.open(“tmp2.jpg”)
image.show()
How can I make such scripts work with all threads of the CPU for inference?
Best regards
Martin