Pretrain GPT-Neo for Open Source GitHub Copilot Model

Hey, I’d love to be a part of this project!

2 Likes

Would love to take part in this project!

2 Likes

I’m also in (if it’s not too late to join!) Would be happy to contribute!

2 Likes

Hey y’all, we are using the discord server posted in the slack channel to communicate. We have our own channel called “copilot-codecompletion.” So head on over there for easier discussion if you are part of the project :nerd_face:

This is super-interesting. I would be happy to participate! Hope there would be enough room for another contributor in this project :slightly_smiling_face:

2 Likes

Could someone share the discord link? I am not able to find it

2 Likes

I would love to contribute to this project.

2 Likes

I would love to be a part of this team/project.

2 Likes

I made the same but only for python and JS: mrm8488/codeBERTaJS · Hugging Face

1 Like

It is in the slack channel. I’m not sure if I’m allowed to share the link here since it is private :frowning:

Actually the discord link is public so I can go ahead and paste it here. Everyone please go over to this discord channel so that we can easily discuss the project see y’all there :nerd_face: (channel is called copilot-codecomplete)

It looks like the team 10 people is already full (judging by projects spreadsheet). But there are couple of interested people who didn’t fit in (including me :upside_down_face:). I’m guessing Copilot should be language specific. Would it make sense to extend the project to include multiple teams working on models for different languages? I would like to work on Copilot for Julia. We could select N most popular languages and form N subteams
Tagging @patrickvonplaten @ncoop57 for your opinions

1 Like

I think having multiple teams is a great idea, but I don’t think we should have single model per language since there will be a lot of overlap. Some interesting ideas I’ve seen people asking for are a copilot specifically for documentation and one specifically for tests. Maybe we can have teams split like that. One for general code generation (current copilot), one for documentation, and one for tests. We could even have a general model trained and then each team is just responsible for fine-tuning it for the specific task. What do y’all think?

That’s what I was thinking about language specific sub-teams.

Really curious to see if Copilot would be able to suggest reasonable tests for your code!

1 Like

Hi! We at CMU are interested in helping out with this as well, and we have some experience in code processing and NL-to-code generation that could perhaps contribute to some of the design decisions like dataset design/filtering and subword tokenization, as well as manpower to train things if that’s still necessary. Not sure what the best way to contribute would be? The discord link above seemed broken for me, but maybe I’m doing something wrong.

8 Likes

I cannot join the discord by the link above as well…

1 Like

Hello Graham. Please use this, Flax-HuggingFace-Community-Week We are discussing under #copilot-code-synthesis. Yes, we could see a potential importance in the dataset curation and a lot wrt tokeniser/beam search design choice. Please feel free to suggest.

1 Like

Yes, we could see a potential importance in the dataset curation and a lot wrt tokeniser/beam search design choice. Please feel free to suggest.

1 Like

Hello, please use the following Flax-HuggingFace-Community-Week to join. Thanks.

1 Like

Wow this is becoming a really cool project! Really exciting!

1 Like