Have you seen slower performance on Llama2 70b vs 13b even if running on a much bigger inference type ?
What is a “inference type” and which “inference types” did you use on 70b and 13b respectively?
Have you seen slower performance on Llama2 70b vs 13b even if running on a much bigger inference type ?
What is a “inference type” and which “inference types” did you use on 70b and 13b respectively?