Why does local-downloaded model files are different from those in huggingface?

I downloaded model to my local PC and saved it using the following codes.

Codes:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "beomi/Llama-3-Open-Ko-8B"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model.save_pretrained("./path/to/model")
tokenizer.save_pretrained("./path/to/model")

Results:

Downloading shards: 100%|██████████| 6/6 [02:50<00:00, 28.42s/it]
Loading checkpoint shards: 100%|██████████| 6/6 [00:06<00:00,  1.01s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

The actual model files uploaded in huggingface(beomi/Llama-3-Open-Ko-8B at main) has 6 model shards and each of model.safetensors filesize does not exceed 3GB,

while the files in my local PC does not match in the filesize or neither in the number of files.

Can anyone explain this situation or the way to solve the problem?

Thank you in advance

Hi,

One can specify a max_shard_size when using the from_pretrained or push_to_hub methods, which defaults to 5 GB: Models

Thank you.

But the thing I am curious about is that the model files in huggingface, the sizes are sum to less than 17GB as below.

while the sum of the model files i have downloaded in the local, is more than 30GB.

How can the size of models be changed?

By default, a precision of float32 (32 bits or 4 bytes per parameter) is used. Hence, as beomi/Llama-3-Open-Ko-8B · Hugging Face has 8 billion parameters, that’s 8*4 = 32 GB.

If you load in half-precision (bfloat16 or 2 bytes per parameter) then you’ll get 8*2 = 16 GB.

1 Like

100 % understood, appreciate!!

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.