S3 uploading advice

sshleifer · August 27, 2020, 11:59pm

@stas was about to write this to you but decided to send it here!

I have done everything with pip install awscli but that may require more privs. (you probably still want awscli to run ls)
Each model should be its own subdir under stas/ with the full complement of files:

something like

aws s3 ls s3://models.huggingface.co/bert/Helsinki-NLP/opus-mt-ru-en/

2020-08-21 10:42:49       1148 README.md
2020-08-18 20:43:55       1133 config.json
2020-04-29 09:40:09  306991893 pytorch_model.bin
2020-05-27 11:23:35  563074700 rust_model.ot
2020-04-29 09:40:21    1080169 source.spm
2020-04-29 09:40:21     802781 target.spm
2020-08-18 20:43:55         42 tokenizer_config.json
2020-08-18 20:43:55    2601758 vocab.json

If you are missing files, you can sometimes get them with

working_model.save_pretrained('en_de') # will mkdir en_de
tokenizer.save_pretrained('en_de')

That’s all that comes to mind at the moment, but lmk if you need anything!

sshleifer · August 28, 2020, 12:05am

another thing that bothered me earlier that transformers-cli handles

you have to duplicate all tokenizer files across every model subdir. S3 cant do symlinks.

stas · August 28, 2020, 2:13am

Thank you very much, @sshleifer - that’s super useful - I will use transformer-cli to start with and then we will update with symlinks.

julien-c · August 31, 2020, 5:02pm

Don’t forget to also add model cards!

Topic		Replies	Views
How to save model in S3 with Trainer? Intermediate	5	5045	May 26, 2023
Load the model fails for sentence-transformers/sentence-t5-xl Beginners	16	319	March 31, 2025
Mirroring Huggingface S3 to download models/tokenizers 🤗Transformers	2	3420	May 4, 2023
Transformers-cli stuck on uploading 🤗Transformers	2	492	August 31, 2020
@sgugger Progress Update Aug 4 -> Aug 19 🤗Transformers	5	393	August 20, 2020

S3 uploading advice

Related topics