Can we run custom quantized llama3-8b on Npu?

is it possible to convert the llama to onnx then can we run it across different libraries? or any other workaround if we can run one fine-tuned llama model across AMD, Intel, and Qualcomm NPU?

1 Like