Is it possible to use a from_pretrained() method to respawn model and tokenizer from my own s3 bucket?

trackmania · July 13, 2020, 2:05pm

I saved a DistilBertModel and a tokenizer with the help of save_pretrained() method. Now, when I load them locally using from_pretrained(’/path_to_distilbert_model’) everything works fine and as intended.
I need these models to be loaded from my own s3 bucket.
But when I try to load them using TFDistilBertModel.from_pretrained(‘s3://my_bucket/path_to_distilbert_model’) there is an error, stating that model configuration file cannot be found or that ‘vocab.txt’ cannot be found if I use DistilBertTokenizer.from_pretrained().
Is it possible to load a model from my own s3 bucket and what is the suggested way to do that? Thanks

BramVanroy · July 13, 2020, 3:24pm

Looking at the source code, I do not think it is possible to load models from external URLs other than the HuggingFace bucket.

github.com

huggingface/transformers/blob/master/src/transformers/modeling_utils.py#L613-L645


if pretrained_model_name_or_path is not None:
    if os.path.isdir(pretrained_model_name_or_path):
        if from_tf and os.path.isfile(os.path.join(pretrained_model_name_or_path, TF_WEIGHTS_NAME + ".index")):
            # Load from a TF 1.0 checkpoint
            archive_file = os.path.join(pretrained_model_name_or_path, TF_WEIGHTS_NAME + ".index")
        elif from_tf and os.path.isfile(os.path.join(pretrained_model_name_or_path, TF2_WEIGHTS_NAME)):
            # Load from a TF 2.0 checkpoint
            archive_file = os.path.join(pretrained_model_name_or_path, TF2_WEIGHTS_NAME)
        elif os.path.isfile(os.path.join(pretrained_model_name_or_path, WEIGHTS_NAME)):
            # Load from a PyTorch checkpoint
            archive_file = os.path.join(pretrained_model_name_or_path, WEIGHTS_NAME)
        else:
            raise EnvironmentError(
                "Error no file named {} found in directory {} or `from_tf` set to False".format(
                    [WEIGHTS_NAME, TF2_WEIGHTS_NAME, TF_WEIGHTS_NAME + ".index"],
                    pretrained_model_name_or_path,
                )
            )
    elif os.path.isfile(pretrained_model_name_or_path) or is_remote_url(pretrained_model_name_or_path):
        archive_file = pretrained_model_name_or_path

This file has been truncated. show original

That being said, why don’t you upload your model to the HuggingFace model hub?

frozhen · July 23, 2020, 12:32am

Wanted to see if you’ve found any solution to this problem?

I’m having the same problem where I need to read a pretrained model via from_pretrained(“s3://bucket_name/model_folder”), which throws the same errors.
I think the reason is, based on the source code, os.path.isdir(“s3://bucket_name/model_folder”) returns False.

StefanSamba · October 4, 2022, 12:32pm

Any solution so far?

On S3 there is no such concept as a “folder” link. That could be a reason that providing a folder path is not working.

sunor · July 7, 2023, 11:26pm

I am not sure if this is still an issue, but I came across this at stackoverflow when looking for storing my own fine-tuned BERT model artifacts somewhere to use during the inference.

It seems helpful, and I am assuming adding
AutoTokenizer.from_pretrained(tokenizer.name, config=tokenizer_config.json)
would solve the tokenizer artifacts, but I feel like other artifacts(vocab.txt etc.) might be problematic to add with the code provided.

So, the simple solution appeared to me just creating a model repo in Huggingface Hub(might be obvious to the experienced eye, but was new to me) and calling your own pre-trained or fine-tuned models from there like
tokenizer = AutoTokenizer.from_pretrained("username/repo_name")
model = AutoModel.from_pretrained("username/repo_name").

I think one can just use pt_model.push_to_hub("my-awesome-model") documented here or using git commit/push as usual to do that.

Of course, this assumes that storing the artifacts in S3 is no must.

Topic		Replies	Views
Download models for local loading Beginners	11	93464	March 18, 2024
Simple Save/Load of tokenizer not working 🤗Transformers	2	1658	November 4, 2020
How to save a pretrained model after finetuning? Beginners	1	1184	May 7, 2023
Can't load tokenizer using from_pretrained, Inference API 🤗Tokenizers	4	1794	May 6, 2024
Using from_pretrained 🤗Transformers	1	50	February 15, 2025

Is it possible to use a from_pretrained() method to respawn model and tokenizer from my own s3 bucket?

Related topics