Looking for full-precision (non-GGUF) 128k models for LoRA fine-tuning!

Hi everyone!

I’m looking for full-precision (non-GGUF) models for LoRA fine-tuning. Preference for uncensored over abliterated.

Specifically:

24-36B creative writing focused, Llama or Mistral architecture
12B Mistral Nemo variant fine-tuned for creative writing with 128k context
40B creative writing focused
70B creative writing focused (have Qwen 72B abliterated already)

Current LoRa fine-tuned models:
Already LoRa fine-tuned, merged, quantized:
“8B”: “NousResearch/Hermes-3-Llama-3.1-8B”, # Llama 3.1 | 128K
“12B”: “unsloth/Mistral-Nemo-Base-2407”, # Mistral Nemo | 128K
“22B”: “anthracite-org/magnum-v4-22b”, # Mistral Small | 32K | prose-focused

On the list but issues:
“36B”: “ThijsL202/Pantheon-RP-Pure-X-Cydonia-UB-v1.3-36B”, # Mistral 3.1 | 128K (no original weights to be found, also mostly a RP-model while I am looking for creative writing. :frowning: )
“70B”: “huihui-ai/Qwen2.5-72B-Instruct-abliterated” # Qwen 2.5 | 128K (unsure about quality damage)

At first I was hoping to get a varied family but ended up sticking with Llama, Mistral and Qwen mostly, but if anyone has any ‘dark horses’ I’d be interested, too.

I need original weights, not quants. Any suggestions? Thanks! :slight_smile:

Hardware:
GPU: Nvidia GeForce RTX 5090 32GB - ASUS TUF Gaming + Nvidia GeForce RTX 5060 ti 16GB
CPU: AMD Ryzen 9 9900X3D - 12 cores - 5,5GHz
RAM: 64GB DDR5 - Kingston Fury Beast RGB
Storage: 2TB - 7.250MB/s - PCIe 4.0 - Samsung 990 EVO Plus (+ 2TB external SSD + 1TB external SSD + 4TB HDD)
Motherboard: ASUS TUF Gaming X870-PLUS WIFI
PSU: 1200W ASUS TUF Gaming Gold ATX 3.1

2 Likes

If the original full weights for a model have been deleted, you’d have to use a version re-uploaded by a third party or find a full weight set that’s fine-tuned for a closely related model, right…? (Since there’s usually a source or target model for derivatives…)
Back-converting from GGUF to Transformers format isn’t impossible, but it’s definitely not ideal.

Below is general information:


What you’re actually optimizing for (and how to verify it fast)

You want Transformers-format original weights (typically .safetensors, dtype bf16/fp16/fp32) and a true long context (ideally 128k / 131072). Do these checks before you invest time:

1) “Original weights” checklist (Hugging Face model page)

  • Tags: Transformers + Safetensors
  • Tensor type: BF16 / FP16 / FP32 shown on the model page (many good writing models expose this explicitly). Example: Anubis shows Safetensors + BF16. (Hugging Face)
  • Files tab: should contain shards like model-00001-of-000xx.safetensors (not only *.gguf, *.awq, *.gptq, *.exl2/exl3).

2) “Real 128k” checklist (don’t trust the headline)

Look for at least one of:

  • max_position_embeddings: 131072 in config.json (common for Mistral Small 3.x). (Hugging Face)
  • or model-specific long-context implementation (Cohere Command-R is a classic gotcha: the model card says 128k (Hugging Face), Transformers docs say 128k (Hugging Face), but configs can show max_position_embeddings: 8192 while model_max_length: 131072 elsewhere (Hugging Face)).

3) “Writing-focused” checklist

  • Model card mentions literary / prose / storytelling / creativity as primary goal (Drummer’s cards do this explicitly). (Hugging Face)
  • Prefer models trained on fiction / narrative corpora or tuned with writing-judged evals (see Gutenberg Encore below). (Hugging Face)

Good full-precision long-context writing weights (Transformers, non-GGUF)

Below are high-signal starting points that (a) have original weights in Safetensors and (b) are commonly used for creative writing / prose with large contexts.

A) 24–36B, Mistral architecture, writing-focused (best “128k class” options)

1) mistralai/Mistral-Small-3.2-24B-Instruct-2506 (base to LoRA from)

  • Why: strong modern Mistral Small, and the config shows 131072 positions (good “native” long context indicator). (Hugging Face)
  • Use-case: best if you want to build your own writing style via LoRA rather than inherit someone else’s “voice”.

2) TheDrummer/Cydonia-24B-v4.3 (prose / bard-style leaning)

  • Why: explicitly positioned around creativity/writing, with user quotes calling it “a prose expert” / “bard”. (Hugging Face)
  • Alignment note: this release notes it “might refuse/be more positive” (so, not the most permissive variant). (Hugging Face)

3) TheDrummer/Cydonia-24B-v4.3.2-heretic (more “uncensored” flavor)

  • Why: “heretic” variants are typically aimed at reducing refusals without doing a crude refusal-ablations pass; this one exposes long-context-friendly config values (max_position_embeddings: 131072) and bf16. (Hugging Face)
  • Practical note: if you prefer “uncensored over abliterated,” this is usually closer to that preference than ablation-style releases.

4) Doctor-Shotgun/Magnum-Diamond-24B (prose-first, Mistral Small 3.2 base)

  • Why: explicitly described as a “prose expert” direction on a Mistral Small 3.2 base. (Hugging Face)
  • Use-case: if your LoRA is meant to add your style, starting from a writing-tuned 24B often reduces the amount of LoRA work.

5) Gryphe/Codex-24B-Small-3.2 (diversity + narrative-format training mix)

  • Why: the model card describes heavy emphasis on diverse storytelling patterns and includes narrative-format data; also shows Safetensors + BF16 + clear base tree. (Hugging Face)
  • Caveat: it’s framed as RP-oriented experimentation, so it may skew more “interactive fiction” than pure literary prose. (Hugging Face)

About “36B, 128k, Mistral/Llama” specifically: in practice, good 128k writing-focused 36B dense models are rarer than 24B and 70B classes. If you must sit near 36B, you’ll often compromise on context length (32k-ish) or move to a different family.


B) 12B Mistral NeMo (128k) tuned for writing

1) nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B

  • Why: explicitly tuned on Gutenberg-style fiction preference datasets and shows a clear writing-judge evaluation section. (Hugging Face)
  • Important realism: the ORPO config uses max_length=4096 during training (Hugging Face). That does not prevent you from running longer contexts (NeMo supports 128k), but it does mean very long-context quality is not guaranteed to scale linearly.

2) TheDrummer/UnslopNemo-12B-v4.1

  • Why: positioned around reducing “slop” and improving writing feel; shows Safetensors + BF16 on the page. (Hugging Face)
  • Use-case: if you want a more “writerly” NeMo base before you apply your own LoRA.

3) p-e-w/Mistral-Nemo-Instruct-2407-heretic-noslop

  • Why: a “heretic + noslop” combination is often exactly the niche you described (less refusal, less corporate voice). (Hugging Face)

4) Dark-horse (not purely “writing,” but high-quality long context NeMo weights)

  • shisa-ai/shisa-v2-mistral-nemo-12b: shows Safetensors + BF16, and explicitly states 128k context in its family table. (Hugging Face)

    • It’s bilingual JA/EN; not a pure writing model, but often surprisingly strong for narrative because it’s well-trained post-instruct.

C) “40B class” creative writing (realistic options)

True 40B + 128k + writing-first is uncommon. Two pragmatic routes:

Route 1: Use a 35B long-context generalist and LoRA it into a writer

  • CohereLabs/c4ai-command-r-v01 (35B)

    • Model card + Transformers docs both state 128k context. (Hugging Face)
    • Caveat: configs/implementations can be confusing (you’ll see max_position_embeddings: 8192 while model_max_length: 131072 appears elsewhere). (Hugging Face)
    • Use-case: strong if your LoRA dataset is high-quality writing and you want a “dark horse” family outside Llama/Mistral/Qwen.

Route 2: Use a literal 40B model that exists, but accept it may not be “writing-first”

  • BSC-LT/ALIA-40b-instruct-2601 (40B) exists as Safetensors and mentions improved long-context capability vs earlier versions. (Hugging Face)

    • This is more “model-family exploration” than a known creative-writing staple.

D) 70B creative writing focused (you already have Qwen 72B abliterated)

These are strong “writer” directions in Transformers weights:

1) Doctor-Shotgun/L3.3-70B-Magnum-v4-SE (prose-first)

  • Why: explicitly framed around writing style / prose and shipped as full weights. (Hugging Face)
  • Practical: this is a very common “start here” if you want narrative strength without relying on ablation.

2) TheDrummer/Anubis-70B-v1.2 (creative + (dis)alignment leaning)

  • Why: explicitly optimized for creativity/writing/dynamism/imagination, and shows Safetensors + BF16. (Hugging Face)

3) Sao10K/70B-L3.3-Cirrus-x1

  • Why: shows Safetensors + BF16, Llama 3.3 base lineage, and is often used as a strong general “70B open” anchor. (Hugging Face)

“Mirrors” and how to recover original weights when you only see quants

1) Use quant repos as breadcrumbs (they often point to the original)

Many GGUF/quant uploaders explicitly include “Original model: …” in the model card. Example: bartowski’s Cydonia GGUF points back to the original weights repo. (Hugging Face)

2) Prefer models that store via HF “xet” + safetensors, then mirror locally

If the original weights exist, you can make your own “mirror” by snapshotting:

  • huggingface_hub snapshot_download(...) (filter *.safetensors, *.json, tokenizer files)
  • or git lfs clone / huggingface-cli download

3) If “original weights don’t exist anymore”

For cases like “only GGUF/EXL2 exists,” treat it as non-recoverable unless:

  • the author re-uploads,
  • or the merge recipe is published and all ingredients still exist.

In that scenario, the practical substitute is: start from the nearest base (e.g., Mistral Small 3.2 24B or Llama 3.3 70B) and apply a LoRA that recreates the style.


Using the UGI leaderboard efficiently (and why it helps for writing)

The UGI space exposes a downloadable dataset (CSV) that includes Writing scores and writing-style diagnostics, letting you filter candidates by:

  • parameter range (12B / 24B / 70B),
  • architecture family,
  • “writing style” / originality / repetition proxies.

Start from the leaderboard, then click through to HF pages and apply the “original weights + real 128k” checks above. (Leaderboard link you gave: Hugging Face Space “UGI-Leaderboard”.)


Fast Hugging Face search workflow (repeatable)

  1. HF search filters
  • Task: Text Generation

  • Library: Transformers

  • Format: Safetensors

  • Sort by: Recently updated (to bias late-2025/2026 activity)

  • Query patterns:

    • 24B 131072 mistral safetensors bf16
    • Nemo 12B 128k safetensors
    • 70B Llama 3.3 safetensors bf16
  1. Open the model → jump to “Model tree”
  • If the base model is a long-context base (Mistral Small 3.2 / Llama 3.3), odds are better the finetune respects that window.
  1. Check the model card for the actual writing intent
  • Look for explicit claims like “prose expert / literary / storytelling,” and/or direct writing evals (Gutenberg Encore does this). (Hugging Face)
  1. Only then look at downloads
  • Downloads correlate with community testing, but not necessarily with your taste in prose. Use them last, not first.

Shortlist by your buckets (actionable)

24–36B (Mistral) writing-focused

12B NeMo writing (128k-capable base)

40B creative writing

70B writing-focused

1 Like

Hi! Oh my gosh, I feel so bad for giving a relatively short answer to your extensive reply. I will save this for future reference though!

I found DavidAU/Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC a few hours after posting my original question actually. :smiley:
I think I skipped Cydonia for whatever reason. I have to admit I don’t recall why. So far all the models I trained do very well on 6 ‘style probe tests’ I put in the training script, and the 1,000 token test I put them through afterwards.

I will probably be that person with as many models as possible squished on multiple SSDs at some point, so I will definitely keep your shortlist in mind!

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.