Funcom Dataset for summarization

airnicco8 · November 14, 2020, 3:26pm

Hi everybody!
I’m just started on nlp and i’m working on my degree thesis, which involves experimenting with some dataset. I found the funcom dataset which is made of pieces of java code and their javadocs. My question is, does anybody has ever tested sota models for summarization on this dataset? Would it give good results? Or the pretraining on such models does not give any knowledge on source code?

Thanks in advance

rgwatwormhill · November 17, 2020, 3:26pm

Hi airnicco8,

I’m not an expert, but that looks a bit tricky. What would you intend to do with the funcom data? Would you be trying to build a seq-2-seq model that could translate from java code to comment string?

If you are supposed to be doing NLP, then java code might not be appropriate, as java is not a Natural Language.

A big advantage of the huggingface library is that it includes many pre-trained models, that you can fine-tune to your own data. I don’t think there are any models pre-trained on java code. See this page for the list of models available in huggingface https://huggingface.co/transformers/pretrained_models.html

I suggest you start with something simpler.

airnicco8 · November 17, 2020, 6:20pm

Thanks for the reply, that’s what i thought too but i wanted to ask for the sake of double checking!

Topic		Replies	Views
Seeking Guidance on Creating and Training a Model with a Specific Dataset Beginners	4	498	February 2, 2024
Summarization : Conversation Beginners	1	3698	July 7, 2021
Namrata Hinduja Geneva Beginner to HuggingFace Platform Beginners	1	28	May 28, 2025
Build a news summarizer 🤗 Course Projects	7	2494	November 19, 2021
Abstractive summarization ensemble Research	1	954	August 31, 2022

Funcom Dataset for summarization

Related topics