I have a dataset with 2 columns,
- around 100k lines
- large amount of free text (around 10k words per cell).
- a list of labels that this free text belong to. usually between 5 to 30 list items.
- the full unique list items (labels) are around 12k.
I was wondering what would be the best way to train a model on this dataset, as majority of transformers has a maximum of 512 or 1024 only tokens as input.