GPT2 for German poetry generation
For this project, I propose to use a pretrained GPT2 model in German and fine-tune it to learn German poetry.
Model
A GPT2 model, pretrained in German can be found here: dbmdz/german-gpt2 · Hugging Face
Datasets
The model can be fine-tuned on the publicly available “Faust” dataset: mobverdb/faust.txt at master · martinth/mobverdb · GitHub
Available training scripts
A training script to fine-tune a GPT2 model in Flax is available here
(Optional) Desired project outcome
The desired project output is a GPT2 model that can generate sensible German poetry. This can be showcased directly on the hub or with a streamlit app.
(Optional) Challenges
The data is written as a dialogue with the speakers names "MEPHISTOPHELES:"
before every paragraph. One might need to remove such data for better model quality.
(Optional) Links to read upon
There are lots of English GPT2 models for poetry: