Hey everyone !
I recently built a very experimental semantic prompt compressor aimed at reducing LLM token usage without losing important context.
Still not sure how worth is the idea, but I did have fun with this experiment.
Built with
spaCy
and YAML rule configs
Domain-sensitive (best for human queries)
Preserves >95% named entities and technical terms
Achieves ~22% compression across real-world prompts
It’s designed to work both for runtime compression and prompt normalization before storage / vector DB ingestion.
Open source and ready to test:
GitHub: GitHub - metawake/prompt_compressor
Full writeup
Would love feedback from the community whether this looks useful or not and whether you faced the need to implement something similar.
Is anyone fighting “token reduction” fight?
Cheers!