DiagonAlly — Novel Synopsis in Under 2 Minutes, With 90% Less Tokens
Live demo: DiagonAlly on HuggingFace Spaces
The problem
Yes, ChatGPT, Claude, Gemini can summarize books. But how?
They need the entire text in their context window. A 250-page novel is ~80,000 tokens. Even with chunking strategies, you’re looking at dozens of API calls, large context models, and significant computational cost. For a single book, that’s fine. For a service analyzing thousands of books? That’s an infrastructure problem.
What if you could get 80% of the quality with 10% of the tokens?
That’s what DiagonAlly does.
The numbers
For a typical 250-page novel (~60,000 words):
| ChatGPT/Claude approach | DiagonAlly | |
|---|---|---|
| Tokens to LLM | 80,000-100,000 | ~8,000-10,000 |
| LLM calls | 10-50 (chunked) | 1 |
| Model size needed | 70B-200B+ | 7B |
| Time | 3-10 minutes | 60-90 seconds |
| GPU | Yes (or expensive API) | No (free inference) |
| Cost per book (API) | $0.15-0.80+ | < $0.01 |
That’s a ~90% reduction in token usage and a model that’s 10-30x smaller. One single API call. No GPU. Free tier.
How is this possible?
The trick is doing the hard work before the LLM sees anything.
DiagonAlly has an algorithmic pre-processing layer — pure Python, no AI, runs in 5 seconds on CPU — that extracts the most meaningful content from the full text. We developed a geometric text scanning technique (the reason for the name) that reads text through a diagonal traversal pattern across a character matrix. This captures semantically rich word clusters from the entire book without processing every line.
On top of this, the algorithm:
- Detects chapters automatically (TOC, heading patterns, or intelligent fallback)
- Identifies recurring characters with name merging (“Dr. Evans” + “John Evans” + “Evans” = same person)
- Selects 80 key sentences distributed evenly across the text, scored by plot relevance
- Samples real prose from throughout the book (1 line every 5)
- Captures how each chapter opens and closes
All of this is packed into one prompt for a 7B model. One call. Done.
What you get
A structured, no-spoiler synopsis with three sections:
- The Premise — setting, atmosphere, what launches the story
- The Characters — who they are, what drives them, key relationships
- Why Read This Book — themes, emotional experience, who it’s for
Tested on multiple genres (sci-fi, classic literature, etc.) with accurate character identification, proper name/title recognition, and coherent narrative understanding.
The real opportunity
We’re running this on a free 7B model and getting decent results. Imagine what happens with a 70B model, or a fine-tuned one. The algorithmic layer does the compression — the model just needs to be smart enough to understand ~8,000 well-chosen tokens. That’s a much easier job than digesting 80,000 raw tokens.
This approach could be applied to:
- Book recommendation platforms — analyze entire catalogs at minimal cost
- Publishing houses — quick manuscript triage
- Libraries and bookstores — auto-generated synopses
- Reading apps — “should I read this?” feature
- Research — rapid literature scanning
Limitations (we’re honest)
- A 7B model sometimes hallucinates details, especially about the later parts of the story
- Chapter detection works well for standard formats, not all unconventional ones
- Very short texts (under 5,000 words) don’t benefit from this approach — just send the full text
How to contribute
The code is ~1,100 lines of clean Python. Easy to read, easy to hack.
We’d love help with:
- Testing on diverse books — different genres, languages, lengths
- Model swapping — try larger models and share results
- Prompt engineering — small prompt changes can have big impact on quality
- Integration ideas — how would you use this in your projects?
Tech stack: Python, Gradio, pdfplumber, ebooklib, HuggingFace Inference API (Qwen2.5-7B).
Built by Paul Olden. Feedback and contributions welcome.