Is it possible to use a from_pretrained() method to respawn model and tokenizer from my own s3 bucket?

sunor · July 7, 2023, 11:26pm

I am not sure if this is still an issue, but I came across this at stackoverflow when looking for storing my own fine-tuned BERT model artifacts somewhere to use during the inference.

It seems helpful, and I am assuming adding
AutoTokenizer.from_pretrained(tokenizer.name, config=tokenizer_config.json)
would solve the tokenizer artifacts, but I feel like other artifacts(vocab.txt etc.) might be problematic to add with the code provided.

So, the simple solution appeared to me just creating a model repo in Huggingface Hub(might be obvious to the experienced eye, but was new to me) and calling your own pre-trained or fine-tuned models from there like
tokenizer = AutoTokenizer.from_pretrained("username/repo_name")
model = AutoModel.from_pretrained("username/repo_name").

I think one can just use pt_model.push_to_hub("my-awesome-model") documented here or using git commit/push as usual to do that.

Of course, this assumes that storing the artifacts in S3 is no must.

Topic		Replies	Views
Download models for local loading Beginners	11	92967	March 18, 2024
Simple Save/Load of tokenizer not working 🤗Transformers	2	1657	November 4, 2020
How to save a pretrained model after finetuning? Beginners	1	1184	May 7, 2023
Can't load tokenizer using from_pretrained, Inference API 🤗Tokenizers	4	1790	May 6, 2024
Using from_pretrained 🤗Transformers	1	50	February 15, 2025

Is it possible to use a from_pretrained() method to respawn model and tokenizer from my own s3 bucket?

Related topics