How to run Phi-1_5 on cpu?

I was wondering how can I run inference on CPU. It is not possible with pytorch and pipelines and Llama CPP. From what I understood it is a mixformers architecture that is not supported. Do you have any ideas?

Hello there! Just bumped into your question while I was Googling Phi model. Right now I’m using Phi 2 on CPU with Rust language and Candle framework. It is slow but it works! You can give it a try if you still can’t manage to run it on CPU.