Im thinking of using Transformer models to classify other sequential data, namely time series data.
My idea is to feed fixed-sized sequences of time series value as input into a BERT-like model with a classification head. Since using pre-trained models probably makes no sense, I would train it from scratch. Since time series values are already numerical, am I right to think that tokenization isn’t needed? How can I assure that a BERT-kind model even understands the input without using the corresponding tokenizer? Is there anything else to know when wanting to control the classification head layers, apart from passing num_values?
What steps would I undergo for this task of time series classification? Any tips? Im grateful for any ideas. Perhaps someone already knows a repository/model? That would be extremly helpful.
@MJimitater I am currently trying to perform a similar experiment. Here were my thoughts:
Given the data already comes in a fixed size, I did not perform any tokenization. This is mainly due to the fact that the data I have comes with labels, so I opted not to perform the self-supervised learning protocol that BERT did. This seems to be different than your situation.
To inform the model of the sequential nature of the time series, I began implementing Time2Vec. My thinking was this would take the place of the positional embeddings referenced in the original BERT paper.
Effectively what I did was copy the architecture of the BERT base model, replace the learning objective with supervised learning (since I have labels), and put a linear classification head on top. I did not introduce a CLS or SEP token.
I wrote this architecture in pytorch using their modules as I wasn’t sure about doing any hacking to the HF models. That being said, it wasn’t that complicated and pytorch has some really good examples to pull from. Being that my model is written in pytorch and follows a pytorch-style training loop, one may elect to use the HF Accelerate library (which I did) to perform distributed training.
Currently my model is training and I suspect it will take a while to reach any acceptable level of accuracy, so I don’t have any particular numbers to report at the time. However, the loss does seem to be decreasing with each epoch!
I realize this is an older post, but if you had any insight into your problem and wouldn’t mind sharing I’d be happy to hear it!