If the original full weights for a model have been deleted, you’d have to use a version re-uploaded by a third party or find a full weight set that’s fine-tuned for a closely related model, right…? (Since there’s usually a source or target model for derivatives…)
Back-converting from GGUF to Transformers format isn’t impossible, but it’s definitely not ideal.
Below is general information:
What you’re actually optimizing for (and how to verify it fast)
You want Transformers-format original weights (typically .safetensors, dtype bf16/fp16/fp32) and a true long context (ideally 128k / 131072). Do these checks before you invest time:
1) “Original weights” checklist (Hugging Face model page)
- Tags:
Transformers + Safetensors
- Tensor type:
BF16 / FP16 / FP32 shown on the model page (many good writing models expose this explicitly). Example: Anubis shows Safetensors + BF16. (Hugging Face)
- Files tab: should contain shards like
model-00001-of-000xx.safetensors (not only *.gguf, *.awq, *.gptq, *.exl2/exl3).
2) “Real 128k” checklist (don’t trust the headline)
Look for at least one of:
max_position_embeddings: 131072 in config.json (common for Mistral Small 3.x). (Hugging Face)
- or model-specific long-context implementation (Cohere Command-R is a classic gotcha: the model card says 128k (Hugging Face), Transformers docs say 128k (Hugging Face), but configs can show
max_position_embeddings: 8192 while model_max_length: 131072 elsewhere (Hugging Face)).
3) “Writing-focused” checklist
- Model card mentions literary / prose / storytelling / creativity as primary goal (Drummer’s cards do this explicitly). (Hugging Face)
- Prefer models trained on fiction / narrative corpora or tuned with writing-judged evals (see Gutenberg Encore below). (Hugging Face)
Good full-precision long-context writing weights (Transformers, non-GGUF)
Below are high-signal starting points that (a) have original weights in Safetensors and (b) are commonly used for creative writing / prose with large contexts.
A) 24–36B, Mistral architecture, writing-focused (best “128k class” options)
1) mistralai/Mistral-Small-3.2-24B-Instruct-2506 (base to LoRA from)
- Why: strong modern Mistral Small, and the config shows 131072 positions (good “native” long context indicator). (Hugging Face)
- Use-case: best if you want to build your own writing style via LoRA rather than inherit someone else’s “voice”.
2) TheDrummer/Cydonia-24B-v4.3 (prose / bard-style leaning)
- Why: explicitly positioned around creativity/writing, with user quotes calling it “a prose expert” / “bard”. (Hugging Face)
- Alignment note: this release notes it “might refuse/be more positive” (so, not the most permissive variant). (Hugging Face)
3) TheDrummer/Cydonia-24B-v4.3.2-heretic (more “uncensored” flavor)
- Why: “heretic” variants are typically aimed at reducing refusals without doing a crude refusal-ablations pass; this one exposes long-context-friendly config values (
max_position_embeddings: 131072) and bf16. (Hugging Face)
- Practical note: if you prefer “uncensored over abliterated,” this is usually closer to that preference than ablation-style releases.
4) Doctor-Shotgun/Magnum-Diamond-24B (prose-first, Mistral Small 3.2 base)
- Why: explicitly described as a “prose expert” direction on a Mistral Small 3.2 base. (Hugging Face)
- Use-case: if your LoRA is meant to add your style, starting from a writing-tuned 24B often reduces the amount of LoRA work.
5) Gryphe/Codex-24B-Small-3.2 (diversity + narrative-format training mix)
- Why: the model card describes heavy emphasis on diverse storytelling patterns and includes narrative-format data; also shows
Safetensors + BF16 + clear base tree. (Hugging Face)
- Caveat: it’s framed as RP-oriented experimentation, so it may skew more “interactive fiction” than pure literary prose. (Hugging Face)
About “36B, 128k, Mistral/Llama” specifically: in practice, good 128k writing-focused 36B dense models are rarer than 24B and 70B classes. If you must sit near 36B, you’ll often compromise on context length (32k-ish) or move to a different family.
B) 12B Mistral NeMo (128k) tuned for writing
1) nbeerbower/Mistral-Nemo-Gutenberg-Encore-12B
- Why: explicitly tuned on Gutenberg-style fiction preference datasets and shows a clear writing-judge evaluation section. (Hugging Face)
- Important realism: the ORPO config uses
max_length=4096 during training (Hugging Face). That does not prevent you from running longer contexts (NeMo supports 128k), but it does mean very long-context quality is not guaranteed to scale linearly.
2) TheDrummer/UnslopNemo-12B-v4.1
- Why: positioned around reducing “slop” and improving writing feel; shows
Safetensors + BF16 on the page. (Hugging Face)
- Use-case: if you want a more “writerly” NeMo base before you apply your own LoRA.
3) p-e-w/Mistral-Nemo-Instruct-2407-heretic-noslop
- Why: a “heretic + noslop” combination is often exactly the niche you described (less refusal, less corporate voice). (Hugging Face)
4) Dark-horse (not purely “writing,” but high-quality long context NeMo weights)
C) “40B class” creative writing (realistic options)
True 40B + 128k + writing-first is uncommon. Two pragmatic routes:
Route 1: Use a 35B long-context generalist and LoRA it into a writer
Route 2: Use a literal 40B model that exists, but accept it may not be “writing-first”
D) 70B creative writing focused (you already have Qwen 72B abliterated)
These are strong “writer” directions in Transformers weights:
1) Doctor-Shotgun/L3.3-70B-Magnum-v4-SE (prose-first)
- Why: explicitly framed around writing style / prose and shipped as full weights. (Hugging Face)
- Practical: this is a very common “start here” if you want narrative strength without relying on ablation.
2) TheDrummer/Anubis-70B-v1.2 (creative + (dis)alignment leaning)
- Why: explicitly optimized for creativity/writing/dynamism/imagination, and shows
Safetensors + BF16. (Hugging Face)
3) Sao10K/70B-L3.3-Cirrus-x1
- Why: shows
Safetensors + BF16, Llama 3.3 base lineage, and is often used as a strong general “70B open” anchor. (Hugging Face)
“Mirrors” and how to recover original weights when you only see quants
1) Use quant repos as breadcrumbs (they often point to the original)
Many GGUF/quant uploaders explicitly include “Original model: …” in the model card. Example: bartowski’s Cydonia GGUF points back to the original weights repo. (Hugging Face)
2) Prefer models that store via HF “xet” + safetensors, then mirror locally
If the original weights exist, you can make your own “mirror” by snapshotting:
huggingface_hub snapshot_download(...) (filter *.safetensors, *.json, tokenizer files)
- or
git lfs clone / huggingface-cli download
3) If “original weights don’t exist anymore”
For cases like “only GGUF/EXL2 exists,” treat it as non-recoverable unless:
- the author re-uploads,
- or the merge recipe is published and all ingredients still exist.
In that scenario, the practical substitute is: start from the nearest base (e.g., Mistral Small 3.2 24B or Llama 3.3 70B) and apply a LoRA that recreates the style.
Using the UGI leaderboard efficiently (and why it helps for writing)
The UGI space exposes a downloadable dataset (CSV) that includes Writing scores and writing-style diagnostics, letting you filter candidates by:
- parameter range (12B / 24B / 70B),
- architecture family,
- “writing style” / originality / repetition proxies.
Start from the leaderboard, then click through to HF pages and apply the “original weights + real 128k” checks above. (Leaderboard link you gave: Hugging Face Space “UGI-Leaderboard”.)
Fast Hugging Face search workflow (repeatable)
- HF search filters
- Open the model → jump to “Model tree”
- If the base model is a long-context base (Mistral Small 3.2 / Llama 3.3), odds are better the finetune respects that window.
- Check the model card for the actual writing intent
- Look for explicit claims like “prose expert / literary / storytelling,” and/or direct writing evals (Gutenberg Encore does this). (Hugging Face)
- Only then look at downloads
- Downloads correlate with community testing, but not necessarily with your taste in prose. Use them last, not first.
Shortlist by your buckets (actionable)
24–36B (Mistral) writing-focused
12B NeMo writing (128k-capable base)
40B creative writing
70B writing-focused