Unsupervised Code-Code Translation based on TransCoder

The Goal of this project is to train an Unsupervised Code-Code Translation Model. There are a lot of different approaches which modeled Supervised with parallel corpora. There is this particular paper([2006.03511] Unsupervised Translation of Programming Languages), which tried to build a Code-Code Translation with no parallel corpora, between languages. This would be a great addition to the Hub too. Let me know if you’re interested.
Code - GitHub - facebookresearch/TransCoder: Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf

5 Likes

I’m interested – would the pretraining script on Facebook’s GitHub be enough for reference for us to implement? Is it doable in 7 days?

EDIT: Although I am interested - I may not be able to join/contribute much don’t want to over-commit

Cool idea! @reshinthadith which model architecture would you like to use? Maybe simply FlaxRoBERTa?

Hello, Patrick. This is Reshinth. The original paper, trained the same on 3 languages(Java, C++ and Python) and their monolingual corpora. So they’ll have 3 Encoders and 3 Decoders. To make it feasible and reduce the complexity, shall we reduce the entire problem to 1 way Translation b/w just two languages. Say (Python to C++) ? This would require us to have 1 Encoder and 1 Decoder. Let me know what you think.
So, the end objective is to have a Seq2Seq Model which can translate from py2cpp, trained with no parallel corpora.
And yes we can use FlaxRoBERTa for the Encoder and the Decoder.

I’m interested! Particularly in doing Python to C++ or vice-versa as I have experience in both languages! Could I join if spots are still available? I also have some knowledge of Github APIs which may help to fetch some of the code corpora, and let’s hope we can leverage Facebook’s pretraining scripts or slightly tweak it for our needs

I’m super interested in this too!!! I’d be down to help out. I think we could even add some natural language too it to do unsupervised code documentation :nerd_face:

Hello, Bharat. The feasibility with respect to the existing repo is somewhat negligent. It’s written in pure pytorch. We’ll see onto how far we can use it.

1 Like

We sure, can try this. This is interesting.

2 Likes

Awesome - finalized the project and added you all!

@reshinthadith - I see if you need an encoder-decoder architecture, I think FlaxT5 would be a good choice :slight_smile:

1 Like

Hello, people. Reach out for discussion at #minimal-unsupervised-transcoder. The code is available at GitHub - reshinthadithyan/hf_jax_transcoder_mini: A Minimal version of Transcoder built with Jax. Feel free to contribute, Thanks.
- Reshinth

1 Like

Hello everyone, can I get model card name of the TransCoder Model in Hugging Face, just for fine-tuning?

Thank you