LangCheck: a multi-lingual toolkit to evaluate LLM applications

Hi! I wanted to share LangCheck, an open source toolkit to evaluate LLM applications (GitHub, Quickstart).

It currently supports English, Japanese, Chinese, and German text, and more languages soon – contributions welcome!

Core functionality:

  • langcheck.metrics – metrics to evaluate quality & structure of LLM-generated text
  • langcheck.plot – interactive visualizations of text quality
  • langcheck.augment – text augmentations to perturb prompts, references, etc (coming soon)

Super open to feedback & curious how other people think about evaluation for LLM apps.