PreTrain RoBERTa/T5 from scratch for Programming Languages

Meeting has been shifted to today (2 July) at 8PM SGT Time.

Thanks @patrickvonplaten

Guyz, please check the discord channel.

@sbmaruf

Hi! How did this project turn out? Are your datasets potentially shareable (what Iā€™m particularly interested in is your FP (Scala, Haskell) datasets ā€”did they come from a github scrape?)?

Thanks,
Andrew

Please take a look at this paper [2303.03004] xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

Thank you. This looks very helpful for my current research (on code gen model interpretability).

I would love to know about your research. Feel free to link it here.

Hey, sorry about the late reply. Iā€™m working on some model interpretability for code LLMs. Iā€™ll link it when itā€™s done :slight_smile: