How to use multithreading on a CPU

MH-BS · January 25, 2025, 1:19pm

Hi everyone

I have a AMD Ryzen™ 7 PRO 4750U with Radeon™ Graphics × 16, so as far as I understood the CPU can run 16 threads simultaneously.

I used two scripts that I found. The first one starts like this:

===== STEP 1: Install Dependencies =====

pip install moondream # Install dependencies in your project directory

===== STEP 2: Download Model =====

Download model (1,733 MiB download size, 2,624 MiB memory usage)

Use: wget (Linux and Mac) or curl.exe -O (Windows)

wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int8.mf.gz

import moondream as md
from PIL import Image

model = md.vl(model=‘./moondream-2b-int8.mf.gz’) # Initialize model
image = Image.open(“tmp.jpg”) # Load image
encoded_image = model.encode_image(image) # Encode image (recommended for multiple operations)

1. Caption any image (length options: “short” or “normal” (default))

caption = model.caption(encoded_image)[“caption”]
print(“Caption:”, caption)

When I run this, then I see a cpu usage of app. 1,600%. So all 16 threads seem to be in use.

But when I run this second script, only one thread is used and for whatever reason the cpu gets a lot hotter:

from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

Function to process the model

def process_model(model, image, frage):
print("\nVAufgabe: " + frage)
print(model.query(image, frage)[“answer”])

Initialize model and load image

model = AutoModelForCausalLM.from_pretrained(
“vikhyatk/moondream2”,
revision=“2025-01-09”,
trust_remote_code=True
)
image = Image.open(“tmp2.jpg”)
image.show()

How can I make such scripts work with all threads of the CPU for inference?

Best regards
Martin

John6666 · January 25, 2025, 1:39pm

The first script uses the Moondream library, and the second script uses the Transformers library, so the libraries used for execution are different. Transformers is generally slow on the CPU.
It is also not suitable for multi-threading. If you are using it mainly on the CPU, you will have to make various adjustments.
If you are only using Moondream on the CPU, I think it is best to use the Moondream library.

MH-BS · January 25, 2025, 2:24pm

Dear John

Thanks. Is it generally possible to use HF models (especially with transformers) for CPU inference and somehow configure them to use all threads of the CPU? I was just astonished how fast the other script ran and would like to use the other models in the same ways.

Best regards
MArtin

John6666 · January 25, 2025, 3:22pm

It is basically difficult to use Transformers as it is. It is not a library suited to mobile or so-called edge devices…
I think that there are few cases where the author has prepared a dedicated script, so I think it would be easier to convert it to ONNX or something similar first. It seems that the ONNX library uses all CPU cores by default.
You can convert to ONNX format locally, and there is also a conversion space on HF.
If the model architecture is too new for the conversion to work, using the github version of ONNX may work.

MH-BS · January 25, 2025, 7:39pm

Thank you, I will look at ONNX.Have not read about that so far.
Best regards
Martin

MH-BS · January 25, 2025, 8:08pm

I tried (with the help of copilot) to convert to the onnx format, but I got this error message:

python3 conv_moon.py
Traceback (most recent call last):
File “/home/martin/esn_vqa/conv_moon.py”, line 34, in
convert_to_onnx(model, tokenizer)
File “/home/martin/esn_vqa/conv_moon.py”, line 10, in convert_to_onnx
torch.onnx.export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/init.py”, line 375, in export
export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 502, in export
_export(
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 1564, in _export
graph, params_dict, torch_out = _model_to_graph(
^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 1113, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 997, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/onnx/utils.py”, line 904, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 1500, in _get_trace_graph
outs = ONNXTracedModule(
^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 139, in forward
graph, out = torch._C._create_graph_by_tracing(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/jit/_trace.py”, line 130, in wrapper
outs.append(self.inner(*trace_inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1726, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/martin/esn_vqa/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 394, in _forward_unimplemented
raise NotImplementedError(
NotImplementedError: Module [HfMoondream] is missing the required “forward” function

Sadly, I found very little on that with google. Is there anything I can do about this?

John6666 · January 26, 2025, 3:48am

Hmm, I don’t know if it’s because the version of ONNX is old and it’s not compatible, or if there’s some kind of bug.
As for Moondream2, the ONNX conversion is available below. The directory structure is a little unusual, but if you download just the necessary files, it should probably be okay.

MH-BS · January 26, 2025, 8:29am

Thank you, I will try it. Right now I moved to OpenGVLab/InternVL2_5-8B-MPO for testing purposes, but I have to wait for the conversion process to finish.

MH-BS · January 26, 2025, 4:42pm

Is it actually possible to search on HF explicitly for quantized models?

Topic		Replies	Views
How to make model.generate() process using multiple CPU cores? 🤗Transformers	2	263	February 10, 2025
How to successfully ONNX pretrained models Beginners	7	272	January 26, 2025
Using multiple CPU threads to run LLM model Beginners	1	5142	June 13, 2023
Offloading LLM models to CPU uses only single core 🤗Transformers	1	4010	June 3, 2024
Number of Inter and Intra-ops threads used by BERT models 🤗Transformers	0	1054	August 15, 2022

How to use multithreading on a CPU

===== STEP 1: Install Dependencies =====

pip install moondream # Install dependencies in your project directory

===== STEP 2: Download Model =====

Download model (1,733 MiB download size, 2,624 MiB memory usage)

Use: wget (Linux and Mac) or curl.exe -O (Windows)

wget https://huggingface.co/vikhyatk/moondream2/resolve/9dddae84d54db4ac56fe37817aeaeb502ed083e2/moondream-2b-int8.mf.gz

1. Caption any image (length options: “short” or “normal” (default))

Function to process the model

Initialize model and load image

Related topics