KeyError: 'classifier.dense.weight' when loading LoRA adapter with quantized Roberta classification model

TetorisAce · September 30, 2025, 1:27am

Hi all,

I fine-tuned a quantized roberta-base classification model using PEFT + LoRA. Then, training runs fine, and I save the adapter.

from datasets import load_dataset
import evaluate
from peft import (
    LoraConfig,
    TaskType,
    get_peft_model,
    prepare_model_for_kbit_training
)
import torch
from transformers import (
    AutoTokenizer,
    DataCollatorWithPadding,
    AutoModelForSequenceClassification,
    BitsAndBytesConfig,
    Trainer,
    TrainingArguments
)
checkpoint = "dstefa/roberta-base_topic_classification_nyt_news"

# create quantization object
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    llm_int8_skip_modules=["classifier"] 
)

base_model = AutoModelForSequenceClassification.from_pretrained(
    checkpoint,
    num_labels=num_labels,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
    quantization_config=quantization_config
    )

# preprocess the quantized model for training
model = prepare_model_for_kbit_training(base_model)

# create LoRA config object
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False, # set to Fasle for training
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    bias='none',
    modules_to_save=["classifier.dense", "classifier.out_proj"],
    )

# create a trainable PeftModel
final_model = get_peft_model(model, lora_config)

final_training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Projects/new-topic-classifier/checkpoint/",
    num_train_epochs=2,
    # eval_strategy="epoch",
    # save_strategy="epoch",
    eval_strategy="steps",          
    eval_steps=10000,                
    save_strategy="steps",          
    save_steps=10000,                 
    save_total_limit=3,  
    load_best_model_at_end=False, 
    logging_strategy="steps",
    logging_steps=50,
    logging_first_step=True,
    fp16=True,
    run_name="final_topic_classifier_run",
    report_to="wandb", # W&B is active
    push_to_hub=True,
    hub_model_id="####/New-topic-classifier-training-model-storage",
    hub_strategy="checkpoint",
)

final_trainer = Trainer(
    model=final_model,
    args=final_training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

final_trainer.train()

# Save the adapter model after training
adapter_output_dir = "/content/drive/MyDrive/Projects/new-topic-classifier/final_adapter"
final_trainer.model.save_pretrained(adapter_output_dir)

# Push the adapter model to Hugging Face Hub
adapter_repo_name = "XXXX/agnews_classifier_naive_model_adapters"
final_trainer.model.push_to_hub(adapter_repo_name)

But when I try to use if for inference like this

## inference
checkpoint = "dstefa/roberta-base_topic_classification_nyt_news"
adapter_repo_name = "XXXX/agnews_classifier_naive_model_adapters"

# create quantization object
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    llm_int8_skip_modules=["classifier"] 
)

base_model = AutoModelForSequenceClassification.from_pretrained(
    checkpoint,
    num_labels=num_labels,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
    quantization_config=quantization_config
    )

base_model.load_adapter(adapter_repo_name)

I got an error:

KeyError: 'classifier.dense.weight'

I tried another way to load a model with the adapter, but it returned the same error.

PeftModel.from_pretrained(base_model, adapter_repo_name)

How should I properly load an adapter for inference in a quantized sequence classification model? Is the issue related to any config setting or training arguments?

Thank you for your help in advance.

John6666 · September 30, 2025, 1:48am

save/load method deviating from PEFT’s design?

Root cause: you saved submodules of the head. At load time PEFT expects the whole classification head to be in modules_to_save, not its internal layers. With 4-bit quantization this mismatch often surfaces as KeyError: 'classifier.dense.weight'. Save modules_to_save=["classifier"], then load the adapter into the quantized base via PeftModel.from_pretrained. (Hugging Face)

Fix your training config

# Training change — save the entire head, not its sublayers
# Docs: https://huggingface.co/docs/peft/en/developer_guides/troubleshooting
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    r=8, lora_alpha=16, lora_dropout=0.1, bias="none",
    modules_to_save=["classifier"],  # <= change
    # Optionally specify target modules; RoBERTa attention/FFN names vary by model
    # target_modules=["query","key","value","dense","intermediate.dense","output.dense"]
)

Key point repeated two ways:

Save the head by its top-level module name ("classifier").
Do not list leaf names like "classifier.dense" or "classifier.out_proj". (Hugging Face)

Correct inference pattern for quantized seq-cls

# Inference — load quantized base, then attach adapter
# BitsAndBytes: https://huggingface.co/docs/transformers/en/quantization/bitsandbytes
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSequenceClassification, BitsAndBytesConfig

checkpoint = "dstefa/roberta-base_topic_classification_nyt_news"
adapter_repo = "XXXX/agnews_classifier_naive_model_adapters"

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base = AutoModelForSequenceClassification.from_pretrained(
    checkpoint,
    num_labels=num_labels, id2label=id2label, label2id=label2id,
    quantization_config=bnb, device_map="auto",
)

# Keep the head in float to avoid 4-bit dtype conflicts
base.classifier.float()

# Load adapter properly (do NOT call load_adapter on the raw base model)
# Correct API: https://huggingface.co/docs/peft/en/developer_guides/troubleshooting
model = PeftModel.from_pretrained(base, adapter_repo)
model.eval()

Key points repeated two ways:

Use PeftModel.from_pretrained(base, adapter_id) to attach the adapter.
Do not call base_model.load_adapter(...) unless base_model is already a PeftModel. (Hugging Face)

Also check these gotchas

Remove ignore_mismatched_sizes=True at inference. It can silently re-init a head with the wrong shape.
Match package versions. If the adapter was saved with a newer PEFT, upgrade locally: pip install -U peft. (Hugging Face)
You don’t need prepare_model_for_kbit_training at inference. Use it only during training.
If your architecture uses a pooler (e.g., some DeBERTa configs), add it too: modules_to_save=["classifier","pooler"]. (Hugging Face)

Why the error happened

PEFT wraps the named modules you list in modules_to_save. If you pass leaf names, the wrapper mapping won’t match after quantization replaces nn.Linear with bnb.nn.Linear4bit, so PEFT can’t find classifier.dense.weight on load. Saving the whole classifier avoids that mismatch. (GitHub)

Minimal checklist

Retrain or resave with modules_to_save=["classifier"].
Load base in 4-bit. Cast base.classifier.float().
PeftModel.from_pretrained(base, adapter_repo).
model.eval() and run inference.

References

PEFT troubleshooting: correct loading and modules_to_save guidance. (Hugging Face)
Transformers bitsandbytes quantization guide. (Hugging Face)
PEFT issue notes on saving the head by top-level name. (GitHub)

TetorisAce · October 1, 2025, 12:44am

Thanks for the detailed explanation—it helped a lot!

Just a small clarification from my side: I had to keep ignore_mismatched_sizes=True, otherwise I encountered the following error during model loading:

RuntimeError: Error(s) in loading state_dict for Linear:
	size mismatch for weight: copying a param with shape torch.Size([8, 768]) from checkpoint, the shape in current model is torch.Size([14, 768]).

So in my case, setting ignore_mismatched_sizes=True was necessary to avoid shape mismatch issues when loading the state dict.

system · October 1, 2025, 12:45pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loading an LoRA adapter trained on quantized model on a non-quantized model Intermediate	0	1397	November 7, 2023
Peft model from pretrained load in 8/4 bit 🤗Transformers	6	17738	October 12, 2023
How to load a model fine-tuned with QLoRA 🤗Transformers	2	6914	July 29, 2024
Inference after QLoRA fine-tuning Intermediate	8	6386	June 7, 2024
Using LoRA Adapters Beginners	0	2210	January 24, 2024