For this project, I propose to use a pretrained GPT2 model in German and fine-tune it to learn German poetry.
A GPT2 model, pretrained in German can be found here: dbmdz/german-gpt2 · Hugging Face
The model can be fine-tuned on the publicly available “Faust” dataset: mobverdb/faust.txt at master · martinth/mobverdb · GitHub
A training script to fine-tune a GPT2 model in Flax is available here
The desired project output is a GPT2 model that can generate sensible German poetry. This can be showcased directly on the hub or with a streamlit app.
The data is written as a dialogue with the speakers names
"MEPHISTOPHELES:" before every paragraph. One might need to remove such data for better model quality.
There are lots of English GPT2 models for poetry: