About using llama-cpp-python

I would like to create a chat demo of a quantized model using llama-cpp-python.

When I implement the Python code and try it with cpu_basic / free, I can chat.
cpu_basic is worked well.

However, when I upgrade to cpu upgrade, $0.03/h, it does not work.

Specifically, when I try to load the model with llama_cpp_python, there is no response after the function call and the subsequent processing is not executed.

I think the reason it does not work well when the CPU or memory is increased is because there is some insufficient setting or machine constraint.

If you have any solutions (suggestions), could you please tell me?