Closest model available to OpenAI's codex/ GitHub Copilot for code completion

Hi everybody,
I’m trying to work on a replication study of a paper done using OpenAI’s codex model, and I’m looking for the closest model to it available on the model hub. This NovelAI/genji-python-6B · Hugging Face is the best one I found, but after some testing it seems to often get fixated on code formatting more than the semantics of the program, is the model too small, trained on too little data or is there something better on the model hub?
I also checked out codeparrot lvwerra/codeparrot · Hugging Face but it appears to struggle on longer prompts.
Should I try with different hyperparameters?
Thank you for any suggestion!

1 Like

Pinging @lvwerra who may have some ideas here based on his experience with CodeParrot :parrot: :slight_smile:

2 Likes

You could also try EleutherAI/gpt-j-6B · Hugging Face which was already trained on code and performs pretty well.

For the quality of generations it makes sense to tune the sampling strategy. If the first suggestion by the model should be good then you should go for low temperatures whereas if you have several tries you can increase the temperature to get more variety in the generation.

3 Likes

Thanks for the quick reply!
I think I didn’t quite understand the value of completed examples/solutions to include in the prompt, adding those improves performance significantly. Do you have any suggestions on literature to better understand sampling strategies/hyperparameters?

Btw I’m getting a build error on the code generation demo of CodeParrot, CodeParrot Generation - a Hugging Face Space by lvwerra, it worked fine earlier today though.
(works now)

2 Likes

You can try out SantaCoder.

1 Like

Thank you very much for the link, it’s very helpful.

Ciao Antonio
scusa se mi permetto di contattarti ma visto la mia poca esperienza qui e visto che sei italiano anche tu mi chiedevo se fai già parte di qualche gruppo o potresti consigliarmene uno tu. Sono autodidatta in programmazione e mi piacerebbe poter imparare da qualcuno che ne sa piu di me. Per cui per ora buon lavoro e un saluto.
luca