Where can I find wildchat-50m judgement data documentation?

I am looking for a dataset of prompts + model response + evaluation score (input, output, quality) across different generating models.
I found WildChat-50m Judgments - a nyu-dice-lab Collection
However, I cannot find any good documentation to understand the structure of this data.
Where can I find documentation of this dataset?
for example:

  • the evaluation model sometimes reply a grade which is not a number - how was is parsed?
  • how to interpret conversations with length > 2?
  • what do all the field mean? Some of them are almost always null
1 Like

It seems that there are no documents other than the paper and GitHub code…