[Reading the forum help, it seemed appropriate to post here rather than in beginners, even though I am a beginner, because the question is specific to Transformers…]
I have a dataset that looks like this (with approx 10k records)
| 2406 4713 8521 .
| 1309 3417 7211 8313 9403 .
| 1102 5403 .
I.e., there are start and stop characters, and each row is an arbitrary length (usually between 1-15 values).
The data are a musical encoding of melodies I’m working on. I’d like to generate new complete sequences that have a strong relationship to the dataset.
Can anyone recommend an appropriate tokenizer for this task?
many thanks, Michael