DiagonAlly — Novel Synopsis in Under 2 Minutes, With 90% Less Tokens

DiagonAlly — Novel Synopsis in Under 2 Minutes, With 90% Less Tokens

Live demo: DiagonAlly on HuggingFace Spaces

The problem

Yes, ChatGPT, Claude, Gemini can summarize books. But how?

They need the entire text in their context window. A 250-page novel is ~80,000 tokens. Even with chunking strategies, you’re looking at dozens of API calls, large context models, and significant computational cost. For a single book, that’s fine. For a service analyzing thousands of books? That’s an infrastructure problem.

What if you could get 80% of the quality with 10% of the tokens?

That’s what DiagonAlly does.

The numbers

For a typical 250-page novel (~60,000 words):

ChatGPT/Claude approach DiagonAlly
Tokens to LLM 80,000-100,000 ~8,000-10,000
LLM calls 10-50 (chunked) 1
Model size needed 70B-200B+ 7B
Time 3-10 minutes 60-90 seconds
GPU Yes (or expensive API) No (free inference)
Cost per book (API) $0.15-0.80+ < $0.01

That’s a ~90% reduction in token usage and a model that’s 10-30x smaller. One single API call. No GPU. Free tier.

How is this possible?

The trick is doing the hard work before the LLM sees anything.

DiagonAlly has an algorithmic pre-processing layer — pure Python, no AI, runs in 5 seconds on CPU — that extracts the most meaningful content from the full text. We developed a geometric text scanning technique (the reason for the name) that reads text through a diagonal traversal pattern across a character matrix. This captures semantically rich word clusters from the entire book without processing every line.

On top of this, the algorithm:

  • Detects chapters automatically (TOC, heading patterns, or intelligent fallback)
  • Identifies recurring characters with name merging (“Dr. Evans” + “John Evans” + “Evans” = same person)
  • Selects 80 key sentences distributed evenly across the text, scored by plot relevance
  • Samples real prose from throughout the book (1 line every 5)
  • Captures how each chapter opens and closes

All of this is packed into one prompt for a 7B model. One call. Done.

What you get

A structured, no-spoiler synopsis with three sections:

  • The Premise — setting, atmosphere, what launches the story
  • The Characters — who they are, what drives them, key relationships
  • Why Read This Book — themes, emotional experience, who it’s for

Tested on multiple genres (sci-fi, classic literature, etc.) with accurate character identification, proper name/title recognition, and coherent narrative understanding.

The real opportunity

We’re running this on a free 7B model and getting decent results. Imagine what happens with a 70B model, or a fine-tuned one. The algorithmic layer does the compression — the model just needs to be smart enough to understand ~8,000 well-chosen tokens. That’s a much easier job than digesting 80,000 raw tokens.

This approach could be applied to:

  • Book recommendation platforms — analyze entire catalogs at minimal cost
  • Publishing houses — quick manuscript triage
  • Libraries and bookstores — auto-generated synopses
  • Reading apps — “should I read this?” feature
  • Research — rapid literature scanning

Limitations (we’re honest)

  • A 7B model sometimes hallucinates details, especially about the later parts of the story
  • Chapter detection works well for standard formats, not all unconventional ones
  • Very short texts (under 5,000 words) don’t benefit from this approach — just send the full text

How to contribute

The code is ~1,100 lines of clean Python. Easy to read, easy to hack.

We’d love help with:

  • Testing on diverse books — different genres, languages, lengths
  • Model swapping — try larger models and share results
  • Prompt engineering — small prompt changes can have big impact on quality
  • Integration ideas — how would you use this in your projects?

Tech stack: Python, Gradio, pdfplumber, ebooklib, HuggingFace Inference API (Qwen2.5-7B).

Built by Paul Olden. Feedback and contributions welcome.

1 Like

Update: v20
What changed since the original post:

Algorithmic layer:

  • Text sampling increased from 1 line every 5 to 1 every 3 — nearly doubles context to the model, zero extra cost
    New capitalized word extraction — catches names, places, institutions the diagonal scan misses
    No redundancy: excludes words already captured by skeleton and entity detector

Model cascade:

  • Primary model now Qwen3-235B (was Qwen2.5-7B). Falls back to 72B, 70B, 7B. Still one call, still free
    Quality jump is significant — accurate character relationships, nuanced thematic analysis

Output restructured:

  • “The Premise” → “The Plot” (400-600 words, was 300-400)
    “Why Read This Book” → “What To Expect” — neutral, balanced: what works AND what doesn’t
    Tone: professional third-person analyst, not enthusiastic first-person reviewer

New feature: spoiler control

  • Checkbox “No-Spoiler Analysis” (default: on)
    Off → full plot including twists and ending
    Tested on Wuthering Heights: no-spoiler stops at first third, full analysis covers everything through resolution

Testing on diverse books remains the most useful feedback. Try it, break it, report back.

1 Like