How to clone model ForSequenceClassification


I want to use mT5ForSequenceClassification locally (on a cluster with no internet) but on the Hub there only seems to be the ForConditionalGeneration architecture, so cloning doesn’t work.

How can I clone the ForSequenceClassification architecture?



One doesn’t clone an architecture, but rather weights (which are stored in a file typically called pytorch_model.bin in case of PyTorch). Hence you can filter the hub on weights for the mt5 model type and “text classification”: Models - Hugging Face. This will give you all files compatible with the mT5ForSequenceClassification class.

As can be seen, there aren’t that many, primarly because T5 is typically used for tasks like machine translation/summarization (where it’s important to generate some output text given input text). One typically uses encoder-only Transformer models like BERT for text classification.

To use the model offline, you can load the weights (pytorch_model.bin) and configuration (config.json) and transfer them to an offline directory. You can then use the from_pretrained method to load the model from a local directory.

from transformers import mT5ForSequenceClassification

model = mT5ForSequenceClassification.from_pretrained("path_to_local_directory")

Thanks @nielsr. I had actually tried this but was confused because I didn’t see a pytorch_model.bin file in the direvtory I saved the pretrained model into. I now realise the .safetensors file is equivalent.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.