Zest, a fine tuned a small Qwen model to work as a command line assistant

Zest - overview

I’ve seen quite a few terminal assistant tools which convert natural English to command line commands using LLMs. While this is cool, I felt that it could be improved. I didn’t like the idea that they rely on third part LLMs, with API calls, and therefore have privacy and security concerts.

I built my own tool called Zest. It is a small model/app that translates natural language directly into command line commands and runs fully locally with no API calls, no cloud dependency, no need for a GPU. There is a confirmation step before running commands, and guardrails against running destructive commands.

This is not to replace your workflow entirely. It’s for when you forgot a command, need help with difficult or long commands, need some help when you’re offline, or are not a frequent command line such as myself or my peers (data analyst/scientists/engineers).

What I did

  • Fine tuned a different small Qwen models (Unsloth) using QLoRA.

  • Around 100k high quality Instruction-Command line pairs

  • Data was rated, augmented, and synthesised using LLMs and manual review

  • Trained on Google Colab using an A100 GPU.

  • Applied DPO data for aligning the model outputs.

  • Model was tested on internal and external benchmarks

  • The model was packaged up (Github Link below) into a .dmg

Challenges

The biggest challenge was the I built a 6-stage data pipeline using on GCS, where 5 of the 6 stages use Gemini as a judge/classifier. The data was in ‘instruction - command’ pairs. The steps were:

  1. Quality Rating
    Data sources are merged and MD5-deduplicated. I used Gemini-2.5-Flash to label each instruction–command pair: good, bad input, bad output, or misaligned.
  2. CLI Tool Categorization
    Bad rows are dropped. Gemini-2.5-Flash-Lite classifies the primary CLI tool (docker, kubectl, git, etc.).
  3. Instruction Disambiguation
    Identical instructions mapped to different commands. Gemini-2.5-Flash rewrites them to be unambiguous (tool context, flags, variables, OS specificity).
  4. Deduplication & Conflict Resolution
    Rewritten rows are rehashed. Remaining conflicts are resolved by Gemini-2.5-Flash selecting the most accurate and robust command
  5. Dataset Balancing
    No LLMs. Logarithmic pruning compresses dominant tools while preserving rare one.
  6. Security Filtering
    Commands are scanned for catastrophic threats (rm -rf /, fork bombs). Normal destructive commands remain; flagged ones go to review.

I’m looking for some people to test it out. If anybody is interested, leave a comment and ill send the file your way!

Links

Training notebook, built upon work I saw on r/localllama: Google Colab

Repo: GitHub - spicy-lemonade/zest-cli: 'Zest' Natural Language to CLI commands tool from Spicy Lemonade · GitHub (Consider starring it :folded_hands: )

1 Like