Train model from scratch on own dataset

Hi folks,
I’m trying to train a couple of models on a sequence classification task.
I’ve been following the tutorial but I still have some questions cause the data is not in text format.

First, my data are already in numeric values, for instance sequences of varying lengths in the following format {inputs: [01,1,0,1,1,1,0,1,1,0,1,1,0], labels: 0}`.

What/How should I use the tokenizer in this case, do I just do padding of sequences?
Any MWE would be appreciated :pray:

I’ve read in the docs that for this kind of task the bert style of models are more suited, could anyone list a couple of models here to try that would work for this task.

I want to train from scratch and not fine-tune any pretrained model?