Can we run custom quantized llama3-8b on Npu?

AryaPulkit · December 6, 2024, 7:20am

is it possible to convert the llama to onnx then can we run it across different libraries? or any other workaround if we can run one fine-tuned llama model across AMD, Intel, and Qualcomm NPU?

Topic		Replies	Views
AWQ quantized version of Llama 3 8B ChatQA Models	0	204	May 3, 2024
How to run large LLMs like Llama 3.1 70B or Mixtral 8x22B with limited GPU VRAM? Beginners	2	1695	September 26, 2024
Loading Llama 2 with quantization on M1 MacBooks Models	2	5393	December 15, 2023
Nvidia P40 and LLama 2 Beginners	0	2321	August 15, 2023
Can I convert llama 2 "Chat" model into onnx using llama/convert_to_onnx.py script? 🤗Optimum	5	1776	August 26, 2024

Can we run custom quantized llama3-8b on Npu?

Related topics