Creating language model only Lora Config

I am currently looking into doing SFT fine-tuning of a VLLM using this guide provided: Fine-tuning a Multimodal Model Using SFT (Single or Multi-Image Dataset) , and I realised that the Lora config mentioned is just using the “all-linear” target_modules like such:

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.05,
    r=16,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
    modules_to_save=[
        "lm_head",
        "embed_tokens",
    ],
)

Just curious, is there a way to apply the target_modules to these

["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]

and I only want it to be effected to only the language model layers?

Currently, I tried doing this:

model = "google/gemma-3-4b-it"
layer_pattern_name = "model.language_model.layers"
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.05,
    r=16,
    bias="none",
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    layers_to_transform = [i for i in range(len(eval(layer_pattern_name)))],
    layer_pattern_name=layer_pattern_name,
    task_type="CAUSAL_LM"
)

But I got this error message when passing this config to SFTTrainer

ValueError: Target modules {‘q_proj’, ‘up_proj’, ‘down_proj’, ‘gate_proj’, ‘o_proj’, ‘v_proj’, ‘k_proj’} not found in the base model. Please check the target modules and try again. Note: You specified ‘layers_to_transform’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. You also specified ‘layers_pattern’: model.language_model.layers.

Wondering if anyone has had the same issue. Thank you!

1 Like

The module name to be specified seems to differ slightly from other models.


You can apply LoRA to ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"] and restrict it to the text stack only. Your error comes from two things: using the wrong key (layers_pattern is the correct field) and relying on a brittle module path. On Gemma-3 VLMs the vision tower also contains q_proj/k_proj/v_proj/o_proj names, so you must scope LoRA to the text layers explicitly. (Hugging Face)

Background in one page

  • How PEFT picks modules. target_modules matches by module name (suffix or regex). If those names appear in multiple submodules, PEFT will hit all of them unless you constrain by layer indices and a layer pattern. That’s what layers_to_transform and layers_pattern do. (Hugging Face)
  • Why “LM-only” matters for VLMs. Gemma-3 VLM uses a SigLIP vision encoder. That encoder defines q_proj/k_proj/v_proj/out_proj too, so a naive target_modules=[...] would also touch vision unless you filter for the text stack. (Google Developers Blog)
  • Text stack shape and naming drift. Gemma-3 configs expose the text stack via config.text_config.num_hidden_layers. Internal module paths changed across releases, so hard-coding model.language_model.layers is fragile. Use a pattern that matches both model.layers and language_model.layers. The refactor is visible in user reports about adapter key paths moving. (Hugging Face)
  • vLLM and TRL context. TRL’s SFT VLM guide is the right procedure for Gemma-3 multimodal SFT. vLLM docs also show the familiar q/k/v and MLP projection names. (Hugging Face)

Working configurations

A) “LM-only” LoRA with explicit layer filtering

This confines LoRA to the text stack even on multimodal checkpoints and across Transformers versions.

# deps: peft>=0.11, transformers>=4.52  (Gemma 3 supported)
# URLs:
# - PEFT Lora fields: https://huggingface.co/docs/peft/en/conceptual_guides/lora
# - TRL VLM SFT:      https://huggingface.co/docs/trl/main/en/training_vlm_sft
from peft import LoraConfig
from transformers import AutoModelForCausalLM

model_id = "google/gemma-3-4b-it"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

# Read the number of text layers from config to avoid guessing paths
n_text_layers = getattr(getattr(model.config, "text_config", None), "num_hidden_layers", None)
if n_text_layers is None:
    # Fallback: common single-stack configs still expose num_hidden_layers at top-level
    n_text_layers = getattr(model.config, "num_hidden_layers")

peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    # Constrain to the text transformer blocks only
    layers_to_transform=list(range(n_text_layers)),
    # Accept both pre/post-refactor names:
    layers_pattern=r"(?:^|\.)(model|language_model)\.layers$",
    # Optional: keep embeddings/head in full precision
    modules_to_save=["lm_head", "embed_tokens"],
)

Why this works:

  • layers_to_transform tells PEFT which block indices to touch.
  • layers_pattern tells PEFT which stack owns those indices. The regex above matches either ...model.layers or ...language_model.layers, covering recent naming changes. (Hugging Face)

B) Minimal change if you fine-tune a text-only Gemma-3

If your checkpoint is not multimodal, you can omit the layer filtering and just list the projections. Many community LoRAs use exactly these targets. For VLMs, prefer Option A. (Kaggle)

peft_config = LoraConfig(
    r=16, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM",
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    modules_to_save=["lm_head", "embed_tokens"],
)

Quick self-check and debugging

Run these once before training:

# 1) Confirm the text stack path and layer count
cands = [n for n,_ in model.named_modules() if n.endswith(".layers")]
print("layer containers:", cands)  # should include ...model.layers or ...language_model.layers

# 2) List modules PEFT would touch with your targets
targets = ("q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj")
hits = [n for n,_ in model.named_modules() if n.endswith(targets)]
print("sample hits:", hits[:10])

If you still see “Target modules not found,” it means your names or pattern do not match what the model exposes. This is a common PEFT error when paths differ, or when layers_pattern is wrong. Adjust the pattern to your actual container path. (GitHub)

Notes that avoid future breakage

  • Use layers_pattern (correct key). layer_pattern_name is not a valid field. PEFT documents layers_pattern and layers_to_transform as the supported scoping knobs. (Hugging Face)
  • Derive the layer count from config.text_config.num_hidden_layers when present. Gemma-3 VLM configs expose this under text_config. (Hugging Face)
  • Expect model-internal renames over time. Users report LoRA adapter paths moving across refactors. Let the regex handle both model.layers and language_model.layers. (GitHub)
  • Multimodal models: the vision encoder also uses q_proj/k_proj/v_proj/out_proj. Do not rely on target_modules alone for LM-only tuning. Constrain with layers_pattern + layers_to_transform. (Hugging Face)
  • TRL VLM SFT is the recommended recipe for Gemma-3 multimodal SFT; pass the peft_config directly to SFTTrainer. (Hugging Face)

Why these particular targets

The list covers attention projections and the MLP bottleneck. This is the standard set for Llama-family models and works well for Gemma-3 text blocks. You can also start with attention-only (q/k/v/o) if you need smaller adapters. vLLM’s Gemma mapping shows the same projections and groups like qkv_proj and gate_up_proj. (VLLM Docs)

Short, vetted extras

Guides

  • TRL: Fine-tuning a Multimodal Model using SFT. Clear end-to-end VLM SFT walkthrough. (Hugging Face)
  • Blog: Fine-tune VLMs with TRL (conceptual + code). (Phil Schmid)

APIs and docs

  • PEFT LoRA reference and scoping fields. Concise definitions and behavior. (Hugging Face)
  • Transformers Gemma-3 docs. Config fields and text stack metadata. (Hugging Face)

Examples

  • Community Gemma-3 LoRA using the same targets. Quick sanity check. (Kaggle)

Thank you so much for the solution @John6666 . This is super helpful. I had a go with passing the exact same peft_config to the SFTTrainer, and it seems like I am still getting the same issue here:

ValueError: No modules were targeted for adaptation. This might be caused by a combination of mismatched target modules and excluded modules. Please check your `target_modules` and `exclude_modules` configuration. You may also have only targeted modules that are marked to be saved (`modules_to_save`). Note: You specified 'layers_to_transform': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. You also specified 'layers_pattern': (?:^|\.)(model|language_model)\.layers$.

For context I am using these versions

peft>=0.16.0
transformers>=4.56.2
1 Like

Update: just got the solution by using this pattern:

r"(?:language_model|model)\.layers"

This allows only language_model to be filtered down

1 Like