Zest - overview
I’ve seen quite a few terminal assistant tools which convert natural English to command line commands using LLMs. While this is cool, I felt that it could be improved. I didn’t like the idea that they rely on third part LLMs, with API calls, and therefore have privacy and security concerts.
I built my own tool called Zest. It is a small model/app that translates natural language directly into command line commands and runs fully locally with no API calls, no cloud dependency, no need for a GPU. There is a confirmation step before running commands, and guardrails against running destructive commands.
This is not to replace your workflow entirely. It’s for when you forgot a command, need help with difficult or long commands, need some help when you’re offline, or are not a frequent command line such as myself or my peers (data analyst/scientists/engineers).
What I did
-
Fine tuned a different small Qwen models (Unsloth) using QLoRA.
-
Around 100k high quality Instruction-Command line pairs
-
Data was rated, augmented, and synthesised using LLMs and manual review
-
Trained on Google Colab using an A100 GPU.
-
Applied DPO data for aligning the model outputs.
-
Model was tested on internal and external benchmarks
-
The model was packaged up (Github Link below) into a .dmg
Challenges
The biggest challenge was the I built a 6-stage data pipeline using on GCS, where 5 of the 6 stages use Gemini as a judge/classifier. The data was in ‘instruction - command’ pairs. The steps were:
- Quality Rating
Data sources are merged and MD5-deduplicated. I used Gemini-2.5-Flash to label each instruction–command pair: good, bad input, bad output, or misaligned. - CLI Tool Categorization
Bad rows are dropped. Gemini-2.5-Flash-Lite classifies the primary CLI tool (docker,kubectl,git, etc.). - Instruction Disambiguation
Identical instructions mapped to different commands. Gemini-2.5-Flash rewrites them to be unambiguous (tool context, flags, variables, OS specificity). - Deduplication & Conflict Resolution
Rewritten rows are rehashed. Remaining conflicts are resolved by Gemini-2.5-Flash selecting the most accurate and robust command - Dataset Balancing
No LLMs. Logarithmic pruning compresses dominant tools while preserving rare one. - Security Filtering
Commands are scanned for catastrophic threats (rm -rf /, fork bombs). Normal destructive commands remain; flagged ones go to review.
I’m looking for some people to test it out. If anybody is interested, leave a comment and ill send the file your way!
Links
Training notebook, built upon work I saw on r/localllama: Google Colab
Repo: GitHub - spicy-lemonade/zest-cli: 'Zest' Natural Language to CLI commands tool from Spicy Lemonade · GitHub (Consider starring it
)