Closest model available to OpenAI's codex/ GitHub Copilot for code completion

AntonioLopardo · March 1, 2022, 11:17am

Hi everybody,
I’m trying to work on a replication study of a paper done using OpenAI’s codex model, and I’m looking for the closest model to it available on the model hub. This NovelAI/genji-python-6B · Hugging Face is the best one I found, but after some testing it seems to often get fixated on code formatting more than the semantics of the program, is the model too small, trained on too little data or is there something better on the model hub?
I also checked out codeparrot lvwerra/codeparrot · Hugging Face but it appears to struggle on longer prompts.
Should I try with different hyperparameters?
Thank you for any suggestion!

lewtun · March 1, 2022, 3:57pm

Pinging @lvwerra who may have some ideas here based on his experience with CodeParrot

lvwerra · March 1, 2022, 4:13pm

You could also try EleutherAI/gpt-j-6B · Hugging Face which was already trained on code and performs pretty well.

For the quality of generations it makes sense to tune the sampling strategy. If the first suggestion by the model should be good then you should go for low temperatures whereas if you have several tries you can increase the temperature to get more variety in the generation.

AntonioLopardo · March 1, 2022, 5:45pm

Thanks for the quick reply!
I think I didn’t quite understand the value of completed examples/solutions to include in the prompt, adding those improves performance significantly. Do you have any suggestions on literature to better understand sampling strategies/hyperparameters?

Btw I’m getting a build error on the code generation demo of CodeParrot, CodeParrot Generation - a Hugging Face Space by lvwerra, it worked fine earlier today though.
(works now)

Surajit · January 30, 2023, 5:34pm

You can try out SantaCoder.

ONISSUM · August 7, 2023, 3:45pm

Thank you very much for the link, it’s very helpful.

ONISSUM · August 7, 2023, 3:48pm

Ciao Antonio
scusa se mi permetto di contattarti ma visto la mia poca esperienza qui e visto che sei italiano anche tu mi chiedevo se fai già parte di qualche gruppo o potresti consigliarmene uno tu. Sono autodidatta in programmazione e mi piacerebbe poter imparare da qualcuno che ne sa piu di me. Per cui per ora buon lavoro e un saluto.
luca

Topic		Replies	Views
How to fine tune fine tune GitHub Copilot? Research	3	3626	June 24, 2022
Suggestions for hugging face transformer models for Code and Formal Languages Intermediate	2	1756	May 3, 2022
CodeGen Model - Transfer Learning, Train and Eval (codeparrot/apps database) Beginners	0	539	August 7, 2022
Generative models for code generation? 🤗Transformers	0	791	March 1, 2023
Alot of questions, or, How can i run models locally (for an absolute begginger) Beginners	3	49	July 4, 2025

Closest model available to OpenAI's codex/ GitHub Copilot for code completion

Related topics