falcon-40B inference on older version of torch

Reza1 · June 27, 2023, 7:24am

Hi,
I am wondering if there should be any issue in terms of accuracy with running Falcon-40B on pytorch versions older than 2.x (from which the torch.compile and flash-attn was added to torch)? I am seeing that by even changing the attention to use the normal GeMMs and softmax to compute the attention score and context, the accuracy becomes terrible, and the model generates rubbish! Has anyone else has similar experience or is there any reason why this happens?
Thanks,
Reza

Topic		Replies	Views
Baffling performance issue on most NVidia GPUs with simple transformers + pytorch code Intermediate	5	4499	April 9, 2024
Falcon-7b-instruct ALWAYS returns SHORT ANSWERS on inference endpoint Intermediate	1	907	September 5, 2023
Wav2Vec2: how much context is available for self-attention Models	0	256	March 21, 2023
Different Inference Speed for same size models Models	0	388	August 29, 2021
Want to use CPU for falcon7b Beginners	0	312	June 22, 2023

falcon-40B inference on older version of torch

Related topics