Finetuning GPT-J6B for custom dataset

How to prepare the dataset to feed GPT-J6B for finetuning.
Any steps or tutorial is appreciated. Thanks

Hi, @Syed313! Thanks for the question. :slight_smile:

@deniskamazur modified EleutherAI’s GPT-J 6B model, so you can generate and fine-tune it in Colab or on an equivalent desktop GPU (e.g. single 1080Ti).

:notebook_with_decorative_cover: The proof of concept notebook is available here.

As you are probably already aware: the original GPT-J takes 22+ GB memory for float32 parameters; and even if you cast everything to 16-bit, it will still not fit onto most single-GPU setups short of A6000 and A100. You can inference it on TPU or CPUs, but fine-tuning is a bit more expensive. This implementation should be a bit more cost-effective.

1 Like