Is it possible to use a from_pretrained() method to respawn model and tokenizer from my own s3 bucket?

I saved a DistilBertModel and a tokenizer with the help of save_pretrained() method. Now, when I load them locally using from_pretrained(’/path_to_distilbert_model’) everything works fine and as intended.
I need these models to be loaded from my own s3 bucket.
But when I try to load them using TFDistilBertModel.from_pretrained(‘s3://my_bucket/path_to_distilbert_model’) there is an error, stating that model configuration file cannot be found or that ‘vocab.txt’ cannot be found if I use DistilBertTokenizer.from_pretrained().
Is it possible to load a model from my own s3 bucket and what is the suggested way to do that? Thanks

Looking at the source code, I do not think it is possible to load models from external URLs other than the HuggingFace bucket.

That being said, why don’t you upload your model to the HuggingFace model hub?

Wanted to see if you’ve found any solution to this problem?

I’m having the same problem where I need to read a pretrained model via from_pretrained(“s3://bucket_name/model_folder”), which throws the same errors.
I think the reason is, based on the source code, os.path.isdir(“s3://bucket_name/model_folder”) returns False.

Any solution so far?

On S3 there is no such concept as a “folder” link. That could be a reason that providing a folder path is not working.

I am not sure if this is still an issue, but I came across this at stackoverflow when looking for storing my own fine-tuned BERT model artifacts somewhere to use during the inference.

It seems helpful, and I am assuming adding
AutoTokenizer.from_pretrained(tokenizer.name, config=tokenizer_config.json)
would solve the tokenizer artifacts, but I feel like other artifacts(vocab.txt etc.) might be problematic to add with the code provided.

So, the simple solution appeared to me just creating a model repo in Huggingface Hub(might be obvious to the experienced eye, but was new to me) and calling your own pre-trained or fine-tuned models from there like
tokenizer = AutoTokenizer.from_pretrained("username/repo_name")
model = AutoModel.from_pretrained("username/repo_name").

I think one can just use pt_model.push_to_hub("my-awesome-model") documented here or using git commit/push as usual to do that.

Of course, this assumes that storing the artifacts in S3 is no must.