Using Hugging Face seems to be very hostile and unforgiving for this particular endeavor
?
any other workarounds?
Since zero-shot classification is a method for classifying by specifying candidates each time, why not use normal text classification instead?
With ordinary text classification, the candidates are fixed for each model (without fine-tuning yourself), and classification is carried out within that range, but it is fully automatic within that range.