Hi folks,
I’m trying to train a couple of models on a sequence classification task.
I’ve been following the tutorial but I still have some questions cause the data is not in text format.
First, my data are already in numeric values, for instance sequences of varying lengths in the following format {inputs: [01,1,0,1,1,1,0,1,1,0,1,1,0]
, labels: 0}`.
What/How should I use the tokenizer in this case, do I just do padding of sequences?
Any MWE would be appreciated
I’ve read in the docs that for this kind of task the bert style of models are more suited, could anyone list a couple of models here to try that would work for this task.
I want to train from scratch and not fine-tune any pretrained model?