Evaluating LLM for specific programming languages

eofout · May 23, 2024, 10:26am

Hello everyone! I’m working on a project to fine-tune stable-code for the Ruby programming language.

As a first step, I’m looking at pointers to evaluate the available models too. I came across mxeval/multi-humaneval · Datasets at Hugging Face, but the dataset for Ruby seems incomplete, as the cannonical_solution field is empty.

I’m currently working out the canonical solutions for those 161 entries, but I’d like to ask the community if there are any other evaluation datasets that I can use. Any tips or suggestions are welcome. Thanks in advance.

Topic		Replies	Views
ModelClash: Dynamic LLM Evaluation Through AI Duels Show and Tell	0	47	July 22, 2024
Using LLM for Data Analytics Beginners	1	1291	June 7, 2025
Experience with and extending LLM for software engineering Intermediate	4	484	August 15, 2024
Causal LLM benchmarks Beginners	0	456	June 13, 2023
Evaluate fine-tuned LLM for question answering Beginners	1	49	May 2, 2025

Evaluating LLM for specific programming languages

Related topics