Should I Include Poet Information as a Feature in LLM Training with 3,356 Unique Poets?

Hello, I am working on a project where I am training a large language model (LLM) to generate Arabic poetry. My dataset includes poems from 3,356 unique poets, and I am considering whether to include the poet as a feature (e.g., adding a special token for each poet).

My main concern is whether this will make the model more complex and potentially hinder its ability to learn other important patterns, such as rhyme schemes, meter, and thematic elements. Would adding a unique token for each poet (given the large number) lead to slower convergence or confusion during training? Or is it generally fine to include poet-specific tokens without negatively impacting the model’s learning of other patterns

1 Like