S3 uploading advice

@stas was about to write this to you but decided to send it here!

I have done everything with pip install awscli but that may require more privs. (you probably still want awscli to run ls)
Each model should be its own subdir under stas/ with the full complement of files:

something like

aws s3 ls s3://models.huggingface.co/bert/Helsinki-NLP/opus-mt-ru-en/

2020-08-21 10:42:49       1148 README.md
2020-08-18 20:43:55       1133 config.json
2020-04-29 09:40:09  306991893 pytorch_model.bin
2020-05-27 11:23:35  563074700 rust_model.ot
2020-04-29 09:40:21    1080169 source.spm
2020-04-29 09:40:21     802781 target.spm
2020-08-18 20:43:55         42 tokenizer_config.json
2020-08-18 20:43:55    2601758 vocab.json

If you are missing files, you can sometimes get them with

working_model.save_pretrained('en_de') # will mkdir en_de
tokenizer.save_pretrained('en_de')

That’s all that comes to mind at the moment, but lmk if you need anything!

1 Like

another thing that bothered me earlier that transformers-cli handles

you have to duplicate all tokenizer files across every model subdir. S3 cant do symlinks.

1 Like

Thank you very much, @sshleifer - that’s super useful - I will use transformer-cli to start with and then we will update with symlinks.

Don’t forget to also add model cards!

2 Likes