DeepSpeed with GPT2-XL on Colab

Was anyone able to fine-tune GPT2-XL (or similar models with similar sizes) on COLAB with Deepspeed enabled?

I tried it with a V100 and 25GB RAM instance, w/ and w/out cpu offloading and fp16 with a batch size of 1 but it is still giving OOM on the GPU side and CPU side.

1 Like

I’m encountering the same problem with gpt-neo. Have you found any solutions?