I was wondering how can I run inference on CPU. It is not possible with pytorch and pipelines and Llama CPP. From what I understood it is a mixformers architecture that is not supported. Do you have any ideas?
I was wondering how can I run inference on CPU. It is not possible with pytorch and pipelines and Llama CPP. From what I understood it is a mixformers architecture that is not supported. Do you have any ideas?