Description
A common data science task for many business is to be able to condense the news about their products or services into short summaries. The goal of this task is to fine-tune a model to automatically summarise news articles, ideally in a domain that is of interest to you!
Model(s)
There are various summarisation models on the Hub that have been fine-tuned on the famous CNN/Dailymail dataset. These provide a good starting point for performing domain adaptation:
There are also other summarization models that are worth investigating:
Datasets
Using the summarization
filter on the Hugging Face Hub gives a good list of datasets to start from.
Challenges
[Explain whether the task is feasible with a single T4 GPU (what we get from AWS), does the data need a lot of preprocessing, are there ethical considerations etc]
Desired project outcomes
- Create a Streamlit or Gradio app on Spaces that can summarize news articles, either from their text or from a given URL.
- Don’t forget to push all your models and datasets to the Hub so others can build on them!
Additional resources
Here are some existing spaces as inspiration:
- https://huggingface.co/spaces/chinhon/News_Summarizer
- https://huggingface.co/spaces/benthecoder/news-summarizer
Discord channel
To chat and organise with other people interested in this project, head over to our Discord and:
-
Follow the instructions on the
#join-course
channel -
Join the
#new-summarizer
channel
Just make sure you comment here to indicate that you’ll be contributing to this project