GPT2 for German poetry generation

GPT2 for German poetry generation

For this project, I propose to use a pretrained GPT2 model in German and fine-tune it to learn German poetry.

Model

A GPT2 model, pretrained in German can be found here: dbmdz/german-gpt2 · Hugging Face

Datasets

The model can be fine-tuned on the publicly available “Faust” dataset: mobverdb/faust.txt at master · martinth/mobverdb · GitHub

Available training scripts

A training script to fine-tune a GPT2 model in Flax is available here

(Optional) Desired project outcome

The desired project output is a GPT2 model that can generate sensible German poetry. This can be showcased directly on the hub or with a streamlit app.

(Optional) Challenges

The data is written as a dialogue with the speakers names "MEPHISTOPHELES:" before every paragraph. One might need to remove such data for better model quality.

(Optional) Links to read upon

There are lots of English GPT2 models for poetry:

1 Like