Backend low level kernel libraries used in Transformers

vineel007 · October 25, 2024, 6:00am

Hello @ArthurZ @RaushanTurganbay ,
What are the backend libraries used for low level kernels operations(matmul, softmax etc) in transformers library?
Issue: If I ran same model(say mamba or llama) on x86 machine and aarch64 machine, I observe difference in the model timing. I suspect there are different paths for kernels in x86 and aarch64.
Please specify the backend libraries used for x86 and aarch64

RaushanTurganbay · October 26, 2024, 1:38pm

In Transformers the attention is by default torch.nn.sdpa() so the backend libraries would be handled Pytorch

vineel007 · October 26, 2024, 5:42pm

@RaushanTurganbay,
Yes pytorch is evident. But we need to go another level down i.e what libraries do torch calls for operations? As we observe difference in timings on x86 and arm, we can find bottleneck only if we know low level kernel library

benstokes · October 27, 2024, 9:06am

I also face this issue. but fortunately i saw your thread. Thanks for giving solution.

Topic		Replies	Views
Optimize response time of model output 🤗Transformers	0	680	December 23, 2021
Transformers v3.0.0 is out! 🤗Transformers	0	1952	July 7, 2020
Adding a new model to Transformers with additional dependencies 🤗Transformers	15	1468	October 19, 2020
Bfloat16 conversion results in significantly slower computation for various transformer models 🤗Transformers	0	1439	December 20, 2021
Optimize large scale transformer model inference with ONNX Runtime Models	0	387	January 18, 2022

Backend low level kernel libraries used in Transformers

Related topics