Saving model in safetensors format through Trainer fails for Gemma 2 due to shared tensors

Hello,
I am finetuning google/gemma-2-2b and these are the arguments and trainer call:


model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b", token=token, attn_implementation='eager')

training_args = TrainingArguments(
    output_dir=args.log_dir,
    num_train_epochs=args.epochs,
    per_device_train_batch_size=args.train_batch_size,
    per_device_eval_batch_size=args.eval_batch_size,
    warmup_steps=args.warmup_steps,
    learning_rate=args.learning_rate,
    evaluation_strategy="no",
    logging_dir=args.log_dir,
    logging_steps=50,
    save_strategy="steps",
    save_steps=2000,
    report_to="mlflow",
    run_name=args.run_name,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

I am getting the following error when trainer tries to save the model:

RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'text_model.model.embed_tokens.weight', 'text_model.lm_head.weight'}].
            A potential way to correctly save your model is to use `save_model`.

I hav ecurrently disabled saving as safetensors through the training arguments:
save_safetensors=False,
would be happy to get your take on this and how to handle this issue.

Thanks!

1 Like

It turned out to be a possible unresolved bug. The workaround seems to be to not save in safetensors format like you did, but thatโ€™s not a solutionโ€ฆ
So this function has been buggy since 2023โ€ฆ

Indeed a bug. Should I post this on github or wait for a response here first?

I donโ€™t think the HF library developers are looking at this forum and post properly. Maybe a post on github or in the HF repoโ€™s Discussion if there is one would be preferable. I donโ€™t have a github account at the moment, so if possible, please do.:sweat:
If we know the exact maintainer, we could send a mention with @+username, but in this case the person in charge is unknown.

I opened a bug in the transformers repo:

2 Likes

Thank you.