Extremely slow init of fine-tuned model

I’ve got a very strange behavior of fine-tuned models: init time for them is orders of magnitude slower than for base model (it doesn’t matter if I load it from HF hub or from local drive, so it seems to be not because of my hardware issues)
At the same time, inference speed is the same for fine-tuned and base model

Narrowing down the problem, here is the repro code:

model = AutoModelForSequenceClassification.from_pretrained('distilbert/distilbert-base-multilingual-cased')
model.save_pretrained('./testmodel')
#This is the model from HF. It loads for 1 second
clf = pipeline(task="zero-shot-classification", 
  model='distilbert/distilbert-base-multilingual-cased', 
  tokenizer='distilbert/distilbert-base-multilingual-cased') 
#The same model, but loaded from local storage. One second as well
clf = pipeline(task="zero-shot-classification", 
  model='./testmodel', 
  tokenizer='distilbert/distilbert-base-multilingual-cased') 
#Fine-tuned model based on the model from line 1. This one lasts for ~1 hour (!)
clf = pipeline(task="zero-shot-classification", 
  model='./model_dist_1000_2', 
  tokenizer='distilbert/distilbert-base-multilingual-cased') 

What can be the reason for that and how could it be fixed?
Any clue would be greatly appreciated

My environment is:
python 3.11.7
transformers 4.37.2
torch 2.2.0
No GPU usage