Please read the topic category description to understand what this is all about
Many open-source projects on GitHub use Issues to triage feature requests, bugs, and so on. For example, check out the Issues tab of Transformers and Datasets to get an idea. The goal of this project is to pick your favourite open-source project and create a bot that can automatically assign a
Label (e.g. bug, enhancement etc) to a new GitHub issue.
Any of the pretrained BERT-like models on the Hub should serve as a good basis for this project. Given the domain is about source code, you may find that fine-tuning the language model first on the dataset gives a boost in performance.
For this project you’ll have to create your own dataset by downloading and processing the GitHub issues associated with an open-source project. You can do this with GitHub’s REST or GraphQL APIs. You can find an example dataset on the Hub here:
This is a multilabel classification task, so you’ll need to do some data exploration to figure out which classes can be feasibly detected.
- Create a Streamlit or Gradio app on Spaces that injests new GitHub issues from an open-source project and predicts the Labels of each one.
- Don’t forget to push all your models and datasets to the Hub so others can build on them!
To chat and organise with other people interested in this project, head over to our Discord and:
Follow the instructions on the
Just make sure you comment here to indicate that you’ll be contributing to this project