I would like to finetune CodeLlama-13b in a memory efficient way.
I was able …to do it with CodeLlama-7b, but failing with 13b.
I can't load the model `unsloth/codellama-13b-bnb-4bit`:
```python
model, tokenizer = unsloth.FastLanguageModel.from_pretrained('codellama/CodeLlama-13b-hf', load_in_4bit=True)
```
> ValueError: Supplied state dict for model.layers.28.mlp.gate_proj.weight does not contain `bitsandbytes__*` and possibly other `quantized_stats` components.
I tried to quantize it first, but that also failed
```python
model, tokenizer = unsloth.FastLanguageModel.from_pretrained('codellama/CodeLlama-13b-hf', load_in_4bit=False)
model.save_pretrained_gguf('./codellama-13b-bnb-4bit', tokenizer=tokenizer)
```
> RuntimeError: The weights trying to be saved contained shared tensors [{'model.layers.26.self_attn.q_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight'}, {'model.layers.37.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.39.mlp.up_proj.weight'}, {'model.layers.37.input_layernorm.weight', 'model.layers.32.post_attention_layernorm.weight', 'model.layers.35.input_layernorm.weight', 'model.layers.35.post_attention_layernorm.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.26.input_layernorm.weight', 'model.layers.36.input_layernorm.weight', 'model.layers.34.post_attention_layernorm.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.37.post_attention_layernorm.weight', 'model.norm.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.38.post_attention_layernorm.weight', 'model.layers.34.input_layernorm.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.38.input_layernorm.weight', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.32.input_layernorm.weight', 'model.layers.28.input_layernorm.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.39.input_layernorm.weight', 'model.layers.33.input_layernorm.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.39.post_attention_layernorm.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.36.post_attention_layernorm.weight', 'model.layers.33.post_attention_layernorm.weight'}] that are mismatching the transformers base configuration. Try saving using `safe_serialization=False` or remove this tensor sharing.
Is CodeLlama-13b not supported? Should I be using a different model?