Pretrain GPT-Neo for Open Source GitHub Copilot Model

hd10 · June 30, 2021, 8:45pm

Hey, I’d love to be a part of this project!

birgermoell · June 30, 2021, 9:05pm

Would love to take part in this project!

taisazero · June 30, 2021, 9:42pm

I’m also in (if it’s not too late to join!) Would be happy to contribute!

ncoop57 · June 30, 2021, 9:51pm

Hey y’all, we are using the discord server posted in the slack channel to communicate. We have our own channel called “copilot-codecompletion.” So head on over there for easier discussion if you are part of the project

arampacha · June 30, 2021, 10:08pm

This is super-interesting. I would be happy to participate! Hope there would be enough room for another contributor in this project

naruto7 · July 1, 2021, 2:52am

Could someone share the discord link? I am not able to find it

vishal · July 1, 2021, 3:40am

I would love to contribute to this project.

ainoob101 · July 1, 2021, 4:12am

I would love to be a part of this team/project.

mrm8488 · July 1, 2021, 9:46am

I made the same but only for python and JS: mrm8488/codeBERTaJS · Hugging Face

ncoop57 · July 1, 2021, 12:51pm

It is in the slack channel. I’m not sure if I’m allowed to share the link here since it is private

ncoop57 · July 1, 2021, 1:12pm

Actually the discord link is public so I can go ahead and paste it here. Everyone please go over to this discord channel so that we can easily discuss the project see y’all there (channel is called copilot-codecomplete)

arampacha · July 1, 2021, 3:02pm

It looks like the team 10 people is already full (judging by projects spreadsheet). But there are couple of interested people who didn’t fit in (including me ). I’m guessing Copilot should be language specific. Would it make sense to extend the project to include multiple teams working on models for different languages? I would like to work on Copilot for Julia. We could select N most popular languages and form N subteams
Tagging @patrickvonplaten @ncoop57 for your opinions

ncoop57 · July 1, 2021, 6:35pm

I think having multiple teams is a great idea, but I don’t think we should have single model per language since there will be a lot of overlap. Some interesting ideas I’ve seen people asking for are a copilot specifically for documentation and one specifically for tests. Maybe we can have teams split like that. One for general code generation (current copilot), one for documentation, and one for tests. We could even have a general model trained and then each team is just responsible for fine-tuning it for the specific task. What do y’all think?

arampacha · July 1, 2021, 6:58pm

That’s what I was thinking about language specific sub-teams.

Really curious to see if Copilot would be able to suggest reasonable tests for your code!

gneubig · July 1, 2021, 7:20pm

Hi! We at CMU are interested in helping out with this as well, and we have some experience in code processing and NL-to-code generation that could perhaps contribute to some of the design decisions like dataset design/filtering and subword tokenization, as well as manpower to train things if that’s still necessary. Not sure what the best way to contribute would be? The discord link above seemed broken for me, but maybe I’m doing something wrong.

arampacha · July 1, 2021, 7:59pm

I cannot join the discord by the link above as well…

reshinthadith · July 1, 2021, 8:04pm

Hello Graham. Please use this, Flax-HuggingFace-Community-Week We are discussing under #copilot-code-synthesis. Yes, we could see a potential importance in the dataset curation and a lot wrt tokeniser/beam search design choice. Please feel free to suggest.

reshinthadith · July 1, 2021, 8:12pm

Yes, we could see a potential importance in the dataset curation and a lot wrt tokeniser/beam search design choice. Please feel free to suggest.

reshinthadith · July 1, 2021, 8:14pm

Hello, please use the following Flax-HuggingFace-Community-Week to join. Thanks.

patrickvonplaten · July 1, 2021, 10:42pm

Wow this is becoming a really cool project! Really exciting!

Topic		Replies	Views
How to fine tune fine tune GitHub Copilot? Research	3	3626	June 24, 2022
PreTrain GPT-2 from scratch for German on novel GC4 dataset Flax/JAX Projects	7	1201	July 2, 2021
Pretrain GPT2 from scratch in Korean Flax/JAX Projects	3	990	July 16, 2021
Closest model available to OpenAI's codex/ GitHub Copilot for code completion 🤗Transformers	6	7656	August 7, 2023
Generative models for code generation? 🤗Transformers	0	791	March 1, 2023

Pretrain GPT-Neo for Open Source GitHub Copilot Model

Related topics