CRMA: Drop-in adapter for fine-tuning + continual learning — zero catastrophic forgetting at 7B scale

I built CRMA (Constrained Residual Mixing Adapter) — a small adapter that attaches to every layer of a language model
during fine-tuning. It applies a mathematical constraint that keeps training stable: the model learns new information
but can’t overwrite what it already knows.

Fine-tuning results (Mistral-7B):

  • CRMA holdout loss: 0.1426 vs standard LoRA: 0.1519 (-6.1% improvement)
  • Peak gradient norm reduced 39-84% across 3 independent runs
  • Tested on TinyLlama-1.1B, Mistral-7B-v0.3, Gemma-2-2b-it

Continual learning results (4 domains sequentially: Medical, Legal, Code, Finance):

  • CRMA modular drift: -0.1% (model actually slightly improves on earlier domains)
  • Standard sequential fine-tuning forgetting: +351.4%
  • That’s a 3,500x reduction in catastrophic forgetting
  • No replay buffers, no knowledge distillation, no frozen teacher copy, no extra compute

How it compares:

┌──────────┬────────┬────────┐
│ Method │ Forget │ Needs │
├──────────┼────────┼────────┤
│ EWC │ +58% │ Replay │
├──────────┼────────┼────────┤
│ SDFT │ -0.1pt │ 2x inf │
├──────────┼────────┼────────┤
│ O-LoRA │ Less │ Track │
├──────────┼────────┼────────┤
│ Adaption │ N/A │ $50M │
├──────────┼────────┼────────┤
│ CRMA │ -0.1% │ None │
└──────────┴────────┴────────┘

API is live and testable right now. Free tier available (3 runs/day, TinyLlama). Usage-based pricing for larger
models.

API: CRMA Fine-Tuner & Continual Learning API - Swagger UI

Full technical report (with methodology and ablation history) available on request. Happy to answer questions.

— Kiran

1 Like