How to use multithreading on a CPU

Hi everyone

I have a AMD Ryzen™ 7 PRO 4750U with Radeon™ Graphics × 16, so as far as I understood the CPU can run 16 threads simultaneously.

I used two scripts that I found. The first one starts like this:

===== STEP 1: Install Dependencies =====

pip install moondream # Install dependencies in your project directory

===== STEP 2: Download Model =====

Download model (1,733 MiB download size, 2,624 MiB memory usage)

Use: wget (Linux and Mac) or curl.exe -O (Windows)

wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int8.mf.gz

import moondream as md
from PIL import Image

model = md.vl(model=‘./moondream-2b-int8.mf.gz’) # Initialize model
image = Image.open(“tmp.jpg”) # Load image
encoded_image = model.encode_image(image) # Encode image (recommended for multiple operations)

1. Caption any image (length options: “short” or “normal” (default))

caption = model.caption(encoded_image)[“caption”]
print(“Caption:”, caption)

When I run this, then I see a cpu usage of app. 1,600%. So all 16 threads seem to be in use.

But when I run this second script, only one thread is used and for whatever reason the cpu gets a lot hotter:

from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

Function to process the model

def process_model(model, image, frage):
print("\nVAufgabe: " + frage)
print(model.query(image, frage)[“answer”])

Initialize model and load image

model = AutoModelForCausalLM.from_pretrained(
“vikhyatk/moondream2”,
revision=“2025-01-09”,
trust_remote_code=True
)
image = Image.open(“tmp2.jpg”)
image.show()

How can I make such scripts work with all threads of the CPU for inference?

Best regards
Martin

1 Like

The first script uses the Moondream library, and the second script uses the Transformers library, so the libraries used for execution are different. Transformers is generally slow on the CPU.:sweat_smile:
It is also not suitable for multi-threading. If you are using it mainly on the CPU, you will have to make various adjustments.
If you are only using Moondream on the CPU, I think it is best to use the Moondream library.

Dear John

Thanks. Is it generally possible to use HF models (especially with transformers) for CPU inference and somehow configure them to use all threads of the CPU? I was just astonished how fast the other script ran and would like to use the other models in the same ways.

Best regards
MArtin

1 Like

It is basically difficult to use Transformers as it is. It is not a library suited to mobile or so-called edge devices…
I think that there are few cases where the author has prepared a dedicated script, so I think it would be easier to convert it to ONNX or something similar first. It seems that the ONNX library uses all CPU cores by default.
You can convert to ONNX format locally, and there is also a conversion space on HF.
If the model architecture is too new for the conversion to work, using the github version of ONNX may work.

Thank you, I will look at ONNX.Have not read about that so far.
Best regards
Martin

I tried (with the help of copilot) to convert to the onnx format, but I got this error message:

python3 conv_moon.py
Traceback (most recent call last):
File “/home/martin/esn_vqa/conv_moon.py”, line 34, in
convert_to_onnx(model, tokenizer)
File “/home/martin/esn_vqa/conv_moon.py”, line 10, in convert_to_onnx
torch.onnx.export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/init.py”, line 375, in export
export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 502, in export
_export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 1564, in _export
graph, params_dict, torch_out = _model_to_graph(
^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 1113, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 997, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 904, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 1500, in _get_trace_graph
outs = ONNXTracedModule(
^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 139, in forward
graph, out = torch._C._create_graph_by_tracing(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 130, in wrapper
outs.append(self.inner(*trace_inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1726, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 394, in _forward_unimplemented
raise NotImplementedError(
NotImplementedError: Module [HfMoondream] is missing the required “forward” function

Sadly, I found very little on that with google. Is there anything I can do about this?

1 Like

Hmm, I don’t know if it’s because the version of ONNX is old and it’s not compatible, or if there’s some kind of bug.
As for Moondream2, the ONNX conversion is available below. The directory structure is a little unusual, but if you download just the necessary files, it should probably be okay.

Thank you, I will try it. Right now I moved to OpenGVLab/InternVL2_5-8B-MPO for testing purposes, but I have to wait for the conversion process to finish.

1 Like

Is it actually possible to search on HF explicitly for quantized models?