Is it possible to disassemble a zero-shot model?

I am content with its current results on unseen labels, but am now in a stage where I want to train in on a specific dataset (so I’ll only use a handful of labels as well) and turn it into a supervised classifier. I was not sure if that’s possible, given the zeroshot’s unique architecture.