Create a docstring generator

:wave: Please read the topic category description to understand what this is all about

Description

Applications like GitHub’s CoPilot can automatically generate docstrings from a class or function name. The goal of this project is to fine-tune a Transformer like CodeT5 to do this ourselves!

Model(s)

Generating docstrings from source code can be modelled as a sequence-to-sequence task, so T5 models are a good starting point here:

Datasets

A good dataset for this task is code_search_net, but feel free to find alternative datasets if you can’t find your favourite programming language there.

Challenges

Models like CodeT5 are rather large and you’ll need to think about what metrics one should use for this type of task.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that can automatically generate a docstring from a class of function name in your favourite programming language!
  • Don’t forget to push all your models and datasets to the Hub so others can build on them!

Additional resources

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

  • Follow the instructions on the #join-course channel

  • Join the #docstring-generator channel

Just make sure you comment here to indicate that you’ll be contributing to this project :slight_smile:

1 Like

I find this task interesting and would like to find out more about how to contribute: for example, would fine-tuning for PL to NL (code comment/docstring) generation for Python be a suitable case for this project?

I actually did this by fine-tuning CodeT5 for generating docstrings for Ruby code.

The model is on the hub: nielsr/codet5-small-code-summarization-ruby · Hugging Face

The notebook used to fine-tune T5 can be found here.

Hey @stmnk this is a super cool idea and definitely worth exploring! And thanks to @nielsr you have a nice template to work from :wink:

I’ve also created a Discord channel (see topic description) for you and others in case you need it :slight_smile:

Thank you for the links, they are really helpful!

Thank you!