Load tokenizer from vocab file that's been read into python

Hi there,

I’m trying to instantiate a tokenizer from a vocab file after it’s been read into python. This is because I want to decouple reading objects from disk from model loading, so I want to load files into python in a different way, and then use those python objects to instantiate the hugging face objects. I can do this with actual model itself like this:

with open('pytorch_model.bin', 'rb') as f:
    buffer = io.BytesIO(f.read())
    
with open('config.json', 'r') as f:
    config = DistilBertConfig.from_dict(json.load(f))
    
torch_model = torch.load(buffer, map_location=torch.device('cpu'))
model_test = DistilBertModel.from_pretrained(pretrained_model_name_or_path=None, state_dict=torch_model, config=config)

But I can’t find a way to do it with the tokenizer. Does anyone have an idea how to do this?

Cheers!