CUDA Memory issue for model.generate() in AutoModelForCausalLM

I don’t multi-GPU, so I can’t help you solve the problem, but isn’t it similar to this problem…?

If several people are complaining of the same symptoms, it could be a bug in the library, but it’s not confirmed…