for i in range(50):
model.generate()
I found the inference time is decreasing. I guess it is relative to cuda cache or something. I want to know what factors influenced this result?
1 Like
for i in range(50):
model.generate()
I found the inference time is decreasing. I guess it is relative to cuda cache or something. I want to know what factors influenced this result?