I am trying to implement the reformer classification model(I have implemented the Reformer classification head ) on IMDB data set and would like to understand which of the two available pre-trained models: google/reformer-enwik8 and google/reformer-crime-and-punishment would be suited for the fine-tuning task or both of them won’t work?
google/reformer-enwik8 model: from the document - “The model is a language model that operates on characters. Therefore, this model does not need a tokenizer.” As the model is a character-level model, is it suitable for classification tasks?
Also in general what is the consideration for choosing a pre-trained model to fine-tune it for classification tasks?
To answer you second question,
As can be seen from the past recent developments, pre-trained masked-language models are more suitable for classification tasks or rather tasks that require bi-directional understanding. T5 has also show that it’s possible to achieve comparable results on classification tasks using text-to-text approach.
As for those two reformer models, they are causal LM’s trained for next token prediction so they might not perform well enough for classification. But I haven’t tried this myself so feel free to experiment.
@valhalla Thanks for providing the information. This helps.
I am looking to work with sequences greater than 4096 (longformer handles until seq length = 4096).
Reformer model looked promising but since currently we don’t have any trained model. I am looking at other options, I have heard breaking up the text in smaller length and averaging out the results of the classification, is one approach. But I am not sure about the approach. Any reference/thoughts would help.
one approach would be to chunk the document and get pooler output for each chunk and then average them before using the classifier.
Thank you, let me try the approach. As it will help aid the work, do you know of any related notebook online/ reference material?