PreTrain RoBERTa/T5 from scratch for Programming Languages

Meeting has been shifted to today (2 July) at 8PM SGT Time.

Thanks @patrickvonplaten

Guyz, please check the discord channel.

@sbmaruf

Hi! How did this project turn out? Are your datasets potentially shareable (what I’m particularly interested in is your FP (Scala, Haskell) datasets —did they come from a github scrape?)?

Thanks,
Andrew

Please take a look at this paper [2303.03004] xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

Thank you. This looks very helpful for my current research (on code gen model interpretability).

I would love to know about your research. Feel free to link it here.

Hey, sorry about the late reply. I’m working on some model interpretability for code LLMs. I’ll link it when it’s done :slight_smile: