Fine tuning gpt-neo via ppo

arcco96 · July 2, 2021, 8:49pm

I have a wild idea to improve smaller gpt3 esqe models by tuning their output with ppo a reinforcement learning paper. Originally, this was done to adjust gpt2’s performance to human preference. https://arxiv.org/pdf/1909.08593.pdf

I propose to fine-tune gpt neo directly on “prompt driven” data. Most obviously, higher performing models could teach the lower performance models by providing examples from which the smaller lower performance models could learn.

However I wonder if it is possible to fine-tune the model in a narrower domain ie code completion like copilot. Would proof writing not be the ideal test? With many proofs accessible, perhaps it would make for easily accessible data with more definitive evaluation than conversational quality. Ie we might compare a naive proof to a fine tuned proof of the same problem? I am aware that human eval is still required.

Other prompt driven data likely exists like essays etc. However the technical dream is to compress model performance by fine-tuning with ppo on examples that are sourced from lqrger/higher performance models. Perhaps then we might be able to pull in robust narrow capacities from larger models into smaller models without distilling the entire teacher models knowledge.

Is this a good idea to try? And is the model simply to big to consider this email? Ie deep speed questions

Best,
Aidan

SUNM · June 11, 2023, 11:16am

Hi @arcco96 ,

I have the same issue, have you been successful to fine-tune the gpt neo with ppo and get goo results? can I know your resource?

many many thanks

Topic		Replies	Views
Fine tuning gpt-neo 2.7B with Lora on GSM8K - improve performance Beginners	4	78	June 10, 2025
Finetune GPT-J on custom dataset Models	0	2808	January 18, 2022
GPT-NEO 1.3 always gives same output Beginners	0	17	August 22, 2024
Fine tuning gguf models? 🤗Transformers	1	1440	April 30, 2024
Fine-tuning MT5 - base and make it more ChatGPT like 🤗Transformers	2	368	December 5, 2023

Fine tuning gpt-neo via ppo

Related topics